Announcement

Collapse
No announcement yet.

Benchmarks Of GCC 4.2 Through GCC 4.7 Compilers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ansla
    replied
    Managed to submit the previous reply before finishing it and editing is broken again so ignore the previous post.

    Originally posted by XorEaxEax View Post
    I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.
    It's relatively easy to determine which filters are for less common architectures as a simple `grep "replace-flags -O3" /usr/portage` on a Gentoo machine will show stuff like use ppc && replace-flags -O3 -O2

    Originally posted by XorEaxEax View Post
    A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.
    The latest stable gcc at that time was 4.4.4, and the emerge --info attached to the bug shows it was installed, but for some reason the user had 4.3.2 as the active compiler. At this moment the latest stable or even testing version is 4.5.3 as 4.6 is still masked for various problems. BTW, Gentoo still has gcc-3.4.6 marked as stable so users are free to use that if they need to.

    Leave a comment:


  • Ansla
    replied
    Originally posted by XorEaxEax View Post
    I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.
    It's relatively easy to determine which filters are for less common architectures as a simple `grep "replace-flags -O3" /us

    Originally posted by XorEaxEax View Post
    Ok, might explain why I encountered any problems since this bug was 3 years old and according to the bug report was also marked as fixed 3 years ago.

    Originally posted by XorEaxEax View Post
    A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.

    Originally posted by XorEaxEax View Post
    Yes it's certainly something to be used with care, I doubt it's much used in scientific programs but for games, encoding (ffmpeg, x264) etc it can provide a nice speed boost. I've never come across it being slower with -ffast-math as in the link you provided but it's also not always faster. As for ffmpeg, x264 I doubt disabling -ffast-math would make much difference either way since these encoders use hand-optimized assembly for the performance critical parts. One could of course disable the asm and compare the performance of standard vs -ffast-math. Maybe I'll do that.
    Originally posted by XorEaxEax View Post
    Interesting though that ICC which defaults to a fast and less accurate floating point model was more correct than gcc's -ffast-math, I recall ICC had several floating point modes but I don't know which one is the default.

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by Ansla View Post
    I didn't notice your reply at the time you posted it, only now because of the spammer, so sorry for the late reply.
    No problem, I was surprised to see this one float up again

    Originally posted by Ansla View Post
    Some of these filters stay for a very long time, usually uses don't log bugs saying "hey, that old problem I had with O3 no longer happens", so unless a developer has access to a setup similar to the original bug reporter and is in the mood for some cleanup they will stay indefinitely.
    I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.

    Originally posted by Ansla View Post
    In the case of VisualboyAdvance it seams it wasn't the case of bad code generation but memory usage while compiling,
    Ok, might explain why I encountered any problems since this bug was 3 years old and according to the bug report was also marked as fixed 3 years ago.

    Originally posted by Ansla View Post
    Still, the risk is always present, here is an example of bad code generation on amd64: https://bugs.gentoo.org/show_bug.cgi?id=356087
    A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.

    Originally posted by Ansla View Post
    I never experimented with -ffast-math as i prefer correctness over speed, but here's an interesting blog post about this option http://programerror.com/2009/09/when...ast-math-isnt/ It might be worth checking if programs that enable it by default (like mesa and ffmpeg if I remember correctly) wouldn't benefit from removing -ffast-math
    Yes it's certainly something to be used with care, I doubt it's much used in scientific programs but for games, encoding (ffmpeg, x264) etc it can provide a nice speed boost. I've never come across it being slower with -ffast-math as in the link you provided but it's also not always faster. As for ffmpeg, x264 I doubt disabling -ffast-math would make much difference either way since these encoders use hand-optimized assembly for the performance critical parts. One could of course disable the asm and compare the performance of standard vs -ffast-math. Maybe I'll do that.

    Interesting though that ICC which defaults to a fast and less accurate floating point model was more correct than gcc's -ffast-math, I recall ICC had several floating point modes but I don't know which one is the default.

    Leave a comment:


  • Ansla
    replied
    I didn't notice your reply at the time you posted it, only now because of the spammer, so sorry for the late reply.

    Originally posted by XorEaxEax View Post
    Are these filters removed once someone validates them against a newer compiler or will they remain in the list indefinitely? Would be interesting to know how old some of these flag filters are which you mentioned.
    Some of these filters stay for a very long time, usually uses don't log bugs saying "hey, that old problem I had with O3 no longer happens", so unless a developer has access to a setup similar to the original bug reporter and is in the mood for some cleanup they will stay indefinitely.

    Originally posted by XorEaxEax View Post
    Any chance you could point me to the bug report for VisualboyAdvance? I did extensive benchmarking tests with this against Mess's gba implementation not that long ago and encountered no problems at all using -O3.
    In the case of VisualboyAdvance it seams it wasn't the case of bad code generation but memory usage while compiling, the bug is https://bugs.gentoo.org/show_bug.cgi?id=64670 Also, VisualboyAdvance is only keyworded for the major architectures (amd64, ppc and x86), bad code generation is most likely to happen on less popular architectures like sparc or sh.

    Originally posted by XorEaxEax View Post
    Yes every compiler has bugs, that's inescapable given the complexity of what they are doing. However I can't say I've encountered many bugs due to -O3 when compiling packages, I have encountered many instances where -O3 generates slower code than -O2 though.
    Still, the risk is always present, here is an example of bad code generation on amd64: https://bugs.gentoo.org/show_bug.cgi?id=356087

    Originally posted by XorEaxEax View Post
    Actually, I've used -ffast-math for things like Blender and it generated indentical output to that of not using it (although I was merely rendering using BI, not using any of the simulation engines), it depends on the precision required by the application. I'd say there are by far fewer programs out there that will break or generate faulty output due to loss of precision than those for which it will.
    I never experimented with -ffast-math as i prefer correctness over speed, but here's an interesting blog post about this option http://programerror.com/2009/09/when...ast-math-isnt/ It might be worth checking if programs that enable it by default (like mesa and ffmpeg if I remember correctly) wouldn't benefit from removing -ffast-math

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by Ansla View Post
    This does not necessarily mean if you'll try to build any of these packages with latest gcc and -O3 the resulting binary will be completely broken, the breakage might only occur with older gcc versions, only on some arches, or even only in some corner use cases.
    Are these filters removed once someone validates them against a newer compiler or will they remain in the list indefinitely? Would be interesting to know how old some of these flag filters are which you mentioned.

    Originally posted by Ansla View Post
    The thing is, in order for a flag to be filtered someone must report a bug of the resulting binary not working properly while working when compiling with -O2.
    Any chance you could point me to the bug report for VisualboyAdvance? I did extensive benchmarking tests with this against Mess's gba implementation not that long ago and encountered no problems at all using -O3.

    Originally posted by Ansla View Post
    Long story short, optimization should not theoretically affect what the code does, but every gcc branch has known bugs, and O3 being not officially recommended and not so widely used as O2 will contain even more undiscovered bugs.
    Yes every compiler has bugs, that's inescapable given the complexity of what they are doing. However I can't say I've encountered many bugs due to -O3 when compiling packages, I have encountered many instances where -O3 generates slower code than -O2 though.

    Originally posted by Ansla View Post
    P.S. it might be that locovaca was referring to the effects of -fast-math when saying O3 code will compute 2+2=5, but that's not enabled by O3. only -Ofast enables fast-math that will break almost any program that is not a game or multimedia codec.
    Actually, I've used -ffast-math for things like Blender and it generated indentical output to that of not using it (although I was merely rendering using BI, not using any of the simulation engines), it depends on the precision required by the application. I'd say there are by far fewer programs out there that will break or generate faulty output due to loss of precision than those for which it will.

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by popper View Post
    indeed it is when it works for your case, it might be a bit of fun for you and others reading to git pull the current x264 and turn off the assembly flag then look at the code generated from the fall back C routines and compare to the real tried and trusted fully benchmarked assembly routines.....
    Certainly hand-optimized assembly made by an expert both on assembly and on the subject at hand (as they certainly are when it comes to video encoding) will beat any compiler. However for us mere mortals, GCC and others likely generate far better vectorization than we could do ourselves.

    Originally posted by popper View Post
    and im sure Loren merrit , Dark shikari, Ronald S. Bultje, and many other assembly devs over on x264-dev IRC can give you lots of real life broken routine cases if anyone cares to fix them in a given compiler and Ansla's partial list is also interesting.
    Perhaps, but I remember the compiler smackdowns over at 'breaking eggs and making omelettes' for ffmpeg which were all done with assembly optimization turned off and the only compiler I recall generating broken code was LLVM which back then was alot less mature than it is now.

    Leave a comment:


  • popper
    replied
    Originally posted by curaga View Post
    Re autovectorizing, I recently looked at gcc's generated asm and it did better than what I would've done by hand (I had targeted SSE, and gcc used SSE2 though). Very nice to have the compiler do that successfully.
    indeed it is when it works for your case, it might be a bit of fun for you and others reading to git pull the current x264 and turn off the assembly flag then look at the code generated from the fall back C routines and compare to the real tried and trusted fully benchmarked assembly routines.....

    and im sure Loren merrit , Dark shikari, Ronald S. Bultje, and many other assembly devs over on x264-dev IRC can give you lots of real life broken routine cases if anyone cares to fix them in a given compiler and Ansla's partial list is also interesting.

    Leave a comment:


  • Ansla
    replied
    Originally posted by XorEaxEax View Post
    The fact that -O3 doesn't always generate faster code than -O2 does not mean the code is broken, it just means that the heuristics governing the use of more aggressive optimizations enabled at -O3 sometimes fail to correctly estimate if an optimization will generate faster code and instead actually generates slower code.

    This is not a new thing, there are many options like for instance global loop unrolling which is terribly hard to estimate without runtime data and is therefore not enabled by default in compiler optimization levels, same goes for loop vectorization although that is turned on at -O3 on GCC. Compilers have improved alot in this area, but the only way to really know the compiler will make they right choice is to either manually unroll (like in the good ole days) or provide the compiler with runtime data (profile/feedback optimization).

    As for -O3 generating faulty code, I haven't come across that in a long time, do you have any fresh examples? And yes, any benchmark which doesn't validate that the results are correct is indeed worthless.
    This is a (incomplete) list of Gentoo ebuilds that filter O3 because of invalid code generation:
    app-arch/ppmd
    app-emulation/xen
    dev-ada/asis-gcc
    dev-ada/asis-gpl
    dev-lang/python
    dev-scheme/guile
    dev-util/valgrind
    app-editors/vim
    games-emulation/visualboyadvance
    games-fps/duke3d
    games-strategy/asc
    kde-base/kdm
    media-libs/ming
    media-sound/gnomad
    media-video/kaffeine
    sci-electronics/alliance
    sci-visualization/opendx
    sys-freebsd/freebsd-sbin
    sys-fs/evms
    sys-libs/libsmbios
    sys-process/procps
    www-plugins/nspluginwrapper
    x11-libs/gtk+

    This does not necessarily mean if you'll try to build any of these packages with latest gcc and -O3 the resulting binary will be completely broken, the breakage might only occur with older gcc versions, only on some arches, or even only in some corner use cases. The thing is, in order for a flag to be filtered someone must report a bug of the resulting binary not working properly while working when compiling with -O2.

    That doesn't mean -O2 is perfectly safe, there are plenty packages filtering -O2 or even -O1 on some arches but usually when an optimization level generates bad code all the superior levels will generate bad code as well.

    Long story short, optimization should not theoretically affect what the code does, but every gcc branch has known bugs, and O3 being not officially recommended and not so widely used as O2 will contain even more undiscovered bugs.

    P.S. it might be that locovaca was referring to the effects of -fast-math when saying O3 code will compute 2+2=5, but that's not enabled by O3. only -Ofast enables fast-math that will break almost any program that is not a game or multimedia codec.
    Last edited by Ansla; 04 December 2011, 12:56 PM.

    Leave a comment:


  • curaga
    replied
    Re autovectorizing, I recently looked at gcc's generated asm and it did better than what I would've done by hand (I had targeted SSE, and gcc used SSE2 though). Very nice to have the compiler do that successfully.

    Leave a comment:


  • XorEaxEax
    replied
    Originally posted by locovaca View Post
    A benchmark that tests code that produces possibly faulty results is worthless. If you ask O2 compiled code what 2+2 is and it says 4 in a half second while the O3 code says 5 in a quarter second, which is better?
    The fact that -O3 doesn't always generate faster code than -O2 does not mean the code is broken, it just means that the heuristics governing the use of more aggressive optimizations enabled at -O3 sometimes fail to correctly estimate if an optimization will generate faster code and instead actually generates slower code.

    This is not a new thing, there are many options like for instance global loop unrolling which is terribly hard to estimate without runtime data and is therefore not enabled by default in compiler optimization levels, same goes for loop vectorization although that is turned on at -O3 on GCC. Compilers have improved alot in this area, but the only way to really know the compiler will make they right choice is to either manually unroll (like in the good ole days) or provide the compiler with runtime data (profile/feedback optimization).

    As for -O3 generating faulty code, I haven't come across that in a long time, do you have any fresh examples? And yes, any benchmark which doesn't validate that the results are correct is indeed worthless.

    Leave a comment:

Working...
X