Announcement

Collapse
No announcement yet.

Compiler Benchmarks Of GCC, LLVM-GCC, DragonEgg, Clang

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XorEaxEax
    replied
    Ok here are the results:

    Test system: GCC 4.5.1, Arch Linux 2.6.35 64bit, Core i5
    Program: Mame 1.40
    mame commandline options: -noautoframeskip -frameskip 0 -skip_gameinfo -effect none -nowaitvsync -nothrottle -nosleep -window -mt -str 60

    -O2 -march=native -mfpmath=sse -msse4.2 -ffast-math
    cyber commando 209.14%
    cyber sled 123.52%
    radikal bikers 169.88%
    star gladiator 396.43%
    virtua fighter kids 185.24%

    -O3 -march=native -mfpmath=sse -msse4.2 -ffast-math
    cyber commando 213.44%
    cyber sled 124.71%
    radikal bikers 172.49%
    star gladiator 384.40%
    virtua fighter kids 187.20%

    Same as above (-O3 etc) but with PGO which automatically enables -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer.
    cyber commando 218.23%
    cyber sled 151.83%
    radikal bikers 186.45%
    star gladiator 406.21%
    virtua fighter kids 221.93%

    As much as I hate to admit it your (yotambien's) comment does have some credility in these results since even though -O2 only won in one test (thus an anomaly) it was the test with the biggest difference between -O2 and -O3.

    Other than that, PGO (profile guided optimization) shows that it can increase performance very nicely, I hope LLVM get's this optimization aswell soon. Next time I do a mame benchmark I will do a PGO test with -O2 aswell to see what the results are (particularly star gladiator). Also I will use a larger testcase which may show other instances where -O2 beats -O3.

    Leave a comment:


  • smitty3268
    replied
    I think the compilers should be bootstapped for the compile-time benchmarks. It's not very realistic to compile everything with GCC4.4 system compiler, on a real system it would be using a self-built version that might (or might not) be able to compile programs faster.

    Leave a comment:


  • smitty3268
    replied
    I suspect those tests where O2 outperformed O3 aren't very realistic. They probably have very small code bases that happen to fit into L1 with O2 and get enlarged a bit to only fit in the L2 cache with O3 optimizations, or something like that. Something that i imagine is mostly only true for microbenchmarks rather than a real application.

    Anyway, I think Michael isn't actually setting anything at all. If i remember correctly from the last compiler benchmarks he did, he's just running make without changing any of the default compiler settings from upstream.

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by yotambien View Post
    Right. As I said, those benchmarks simply disagree with the idea that -O3 optimisations will always be at least as fast as -O2. To me, those numbers show that a) differences between -O2 and -O3 are minor; b) -O3 does not consistently produce the fastest binary. Of course, your experience is your experience, which is as valid as those tests.

    What sort of differences do you get with Mame and Handbrake (I guess you mean x264 in this case)?
    I don't know if I've kept any benchmark numbers for -O2 vs -O3 (I'll have to check), since I'm more interested in differences between -03 with or without explicit (non -Ox) optmizations like LTO and PGO etc. But since I was going to do some benchmarking on Mame soon anyways I'll make some -O2/O3 comparisons later this evening on it and post the results here. Just making dinner as we speak

    Leave a comment:


  • ChrisXY
    replied
    Originally posted by nanonyme View Post
    -mtune=native is redundant if you're using -march=native.
    Ok.

    Originally posted by nanonyme View Post
    -fomit-frame-pointer breaks debuggability in x86.
    Who the hell needs debugging functionality in benchmarks? When we want to see which compiler produces the fastest code, what's the point in not generating the fastest code?

    Originally posted by nanonyme View Post
    -O3 has bugs and might slow down run-time in many cases.
    More than a few cases? How many cases exactly? Statistics?

    Leave a comment:


  • yotambien
    replied
    Originally posted by XorEaxEax View Post
    Well, in some tests -O3 loses to -O2, but very slightly. But this is a test from a year ago and I can't even find which version of Gcc was used, nor can I see if it was done on 32bit or 64bit. I test alot of packages routinely (Blender, p7zip, Handbrake, Dosbox, Mame etc) with -O2 and -O3 and O3 comes out on top.
    Right. As I said, those benchmarks simply disagree with the idea that -O3 optimisations will always be at least as fast as -O2. To me, those numbers show that a) differences between -O2 and -O3 are minor; b) -O3 does not consistently produce the fastest binary. Of course, your experience is your experience, which is as valid as those tests.

    What sort of differences do you get with Mame and Handbrake (I guess you mean x264 in this case)?

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by energyman View Post
    it would be more interessting compare that hand written asm with gcc generated code. Unless that is done there is no reason to turn off assembly just to create a testcase that is completely detached from reality.
    The point here is to compare the generated code of compilers (GCC vs LLVM), not hand optimized assembly vs compiler generated code.

    While it certainly would also be interesting seeing how much better hand optimized assembly does against the code generated by these compilers it's not part of THIS benchmark. So yes, there's every reason to disable hand-optimized assembly here.

    Leave a comment:


  • markus_b
    replied
    I quite like the graphics horizontally, looks good.

    But I'd had appreciated a better choice of colors. Something of a similar tint for the GCC stuff and separate tints for the others. I found myself scrolling up/down a couple of times to check which color is which compiler a couple of times.

    Leave a comment:


  • XorEaxEax
    replied
    Well, despite if -O2 beats -O3 in some tests or not, -O3 IS the optimization which is supposed to optimize the best so it's obviously the one to use in a benchmark (unless you are benchmarking across all -O levels). As for -O3 being buggy, it's not from my experience nor is it supposed to be anything but stable.

    Optimizations that are not considered fully working are introduced as separate flags, not into one of the -O levels. If/when they are considered stable (as in actually improving code and not introducing bugs) they are often added to certain -O levels. Some optimizations like for instance -funroll-loops have been around for ages but are not part of any -O level simply because it's very difficult for the compiler to estimate unrolling and thus there can be great gains aswell as great regressions using this optimization. (Although it's turned on by default if you use PGO in which case the compiler has enough data gathered to guarantee making good judgements).

    For the absolute best results though you'd most likely need to run something like Acovea (http://www.coyotegulch.com/products/...o5p4gcc40.html) which omits the -O levels and tests all the flag combinations, but it's not very practical.

    Leave a comment:


  • energyman
    replied
    it would be more interessting compare that hand written asm with gcc generated code. Unless that is done there is no reason to turn off assembly just to create a testcase that is completely detached from reality.

    Leave a comment:

Working...
X