Originally posted by yotambien
View Post
Announcement
Collapse
No announcement yet.
Compiler Benchmarks Of GCC, LLVM-GCC, DragonEgg, Clang
Collapse
X
-
-
I suspect those tests where O2 outperformed O3 aren't very realistic. They probably have very small code bases that happen to fit into L1 with O2 and get enlarged a bit to only fit in the L2 cache with O3 optimizations, or something like that. Something that i imagine is mostly only true for microbenchmarks rather than a real application.
Anyway, I think Michael isn't actually setting anything at all. If i remember correctly from the last compiler benchmarks he did, he's just running make without changing any of the default compiler settings from upstream.
Comment
-
Ok here are the results:
Test system: GCC 4.5.1, Arch Linux 2.6.35 64bit, Core i5
Program: Mame 1.40
mame commandline options: -noautoframeskip -frameskip 0 -skip_gameinfo -effect none -nowaitvsync -nothrottle -nosleep -window -mt -str 60
-O2 -march=native -mfpmath=sse -msse4.2 -ffast-math
cyber commando 209.14%
cyber sled 123.52%
radikal bikers 169.88%
star gladiator 396.43%
virtua fighter kids 185.24%
-O3 -march=native -mfpmath=sse -msse4.2 -ffast-math
cyber commando 213.44%
cyber sled 124.71%
radikal bikers 172.49%
star gladiator 384.40%
virtua fighter kids 187.20%
Same as above (-O3 etc) but with PGO which automatically enables -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer.
cyber commando 218.23%
cyber sled 151.83%
radikal bikers 186.45%
star gladiator 406.21%
virtua fighter kids 221.93%
As much as I hate to admit it your (yotambien's) comment does have some credility in these results since even though -O2 only won in one test (thus an anomaly) it was the test with the biggest difference between -O2 and -O3.
Other than that, PGO (profile guided optimization) shows that it can increase performance very nicely, I hope LLVM get's this optimization aswell soon. Next time I do a mame benchmark I will do a PGO test with -O2 aswell to see what the results are (particularly star gladiator). Also I will use a larger testcase which may show other instances where -O2 beats -O3.
Comment
-
Originally posted by yotambien View PostThat's interesting. What are the percentages? I mean, I suppose higher is better, but what are they? : D
On the other hand, the PGO thingy looks like it actually makes a nice difference...
Comment
-
Originally posted by Ex-Cyber View PostThe Cyber Sled results are impressive; System 21 is a beast. Which Core i5 model is that, and how are you clocking it?
Comment
-
Great article. IMHO more important than the benchmark results are the rather frequent occurrences where Clang/LLVM failed to compile something. There's a lot of talk out there how Clang/LLVM supposedly be better than GCC. Rather than some theoretical talk, this article brings some hard facts to the table: Clang/LLVM still fails miserably in what it's supposed to do, and where it does succeed the resulting binaries are often slower than GCC produced binaries.
Comment
-
Originally posted by smitty3268 View PostI suspect those tests where O2 outperformed O3 aren't very realistic. They probably have very small code bases that happen to fit into L1 with O2 and get enlarged a bit to only fit in the L2 cache with O3 optimizations, or something like that. Something that i imagine is mostly only true for microbenchmarks rather than a real application.
Comment
Comment