Benchmarks Of GCC 4.2 Through GCC 4.7 Compilers

Ansla replied

09 December 2011, 10:22 AM
Managed to submit the previous reply before finishing it and editing is broken again so ignore the previous post.

Originally posted by XorEaxEax View Post

I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.

It's relatively easy to determine which filters are for less common architectures as a simple `grep "replace-flags -O3" /usr/portage` on a Gentoo machine will show stuff like use ppc && replace-flags -O3 -O2

Originally posted by XorEaxEax View Post

A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.

The latest stable gcc at that time was 4.4.4, and the emerge --info attached to the bug shows it was installed, but for some reason the user had 4.3.2 as the active compiler. At this moment the latest stable or even testing version is 4.5.3 as 4.6 is still masked for various problems. BTW, Gentoo still has gcc-3.4.6 marked as stable so users are free to use that if they need to.
Leave a comment:
Ansla replied

09 December 2011, 10:05 AM
Originally posted by XorEaxEax View Post

I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.

It's relatively easy to determine which filters are for less common architectures as a simple `grep "replace-flags -O3" /us

Originally posted by XorEaxEax View Post

Ok, might explain why I encountered any problems since this bug was 3 years old and according to the bug report was also marked as fixed 3 years ago.

Originally posted by XorEaxEax View Post

A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.

Originally posted by XorEaxEax View Post

Yes it's certainly something to be used with care, I doubt it's much used in scientific programs but for games, encoding (ffmpeg, x264) etc it can provide a nice speed boost. I've never come across it being slower with -ffast-math as in the link you provided but it's also not always faster. As for ffmpeg, x264 I doubt disabling -ffast-math would make much difference either way since these encoders use hand-optimized assembly for the performance critical parts. One could of course disable the asm and compare the performance of standard vs -ffast-math. Maybe I'll do that.

Originally posted by XorEaxEax View Post

Interesting though that ICC which defaults to a fast and less accurate floating point model was more correct than gcc's -ffast-math, I recall ICC had several floating point modes but I don't know which one is the default.
Leave a comment:
XorEaxEax replied

09 December 2011, 09:13 AM
Originally posted by Ansla View Post

I didn't notice your reply at the time you posted it, only now because of the spammer, so sorry for the late reply.

No problem, I was surprised to see this one float up again

Originally posted by Ansla View Post

Some of these filters stay for a very long time, usually uses don't log bugs saying "hey, that old problem I had with O3 no longer happens", so unless a developer has access to a setup similar to the original bug reporter and is in the mood for some cleanup they will stay indefinitely.

I see, well I wonder how many of the bugs from the list you showed are very old (as in obsolete) and/or less supported architectures. Again I can only go by the experience I have and those from work and -O3 has been very stable, although not always the fastest.

Originally posted by Ansla View Post

In the case of VisualboyAdvance it seams it wasn't the case of bad code generation but memory usage while compiling,

Ok, might explain why I encountered any problems since this bug was 3 years old and according to the bug report was also marked as fixed 3 years ago.

Originally posted by Ansla View Post

Still, the risk is always present, here is an example of bad code generation on amd64: https://bugs.gentoo.org/show_bug.cgi?id=356087

A question, this (unless I misread) bug is with an old compiler, 4.3.2. The 4.3 branch (which while maintained is quite old and probably not getting alot of love) is at 4.3.5 by now iirc, which makes me wonder what the default compiler is in Gentoo now? Also, is there any way to filter these bug reports according to compiler version?, since if I am at 4.6 I'm probably not very interested in bug reports generated towards version 4.3.

Originally posted by Ansla View Post

I never experimented with -ffast-math as i prefer correctness over speed, but here's an interesting blog post about this option http://programerror.com/2009/09/when...ast-math-isnt/ It might be worth checking if programs that enable it by default (like mesa and ffmpeg if I remember correctly) wouldn't benefit from removing -ffast-math

Yes it's certainly something to be used with care, I doubt it's much used in scientific programs but for games, encoding (ffmpeg, x264) etc it can provide a nice speed boost. I've never come across it being slower with -ffast-math as in the link you provided but it's also not always faster. As for ffmpeg, x264 I doubt disabling -ffast-math would make much difference either way since these encoders use hand-optimized assembly for the performance critical parts. One could of course disable the asm and compare the performance of standard vs -ffast-math. Maybe I'll do that.

Interesting though that ICC which defaults to a fast and less accurate floating point model was more correct than gcc's -ffast-math, I recall ICC had several floating point modes but I don't know which one is the default.
Leave a comment:
Ansla replied

09 December 2011, 06:09 AM
I didn't notice your reply at the time you posted it, only now because of the spammer, so sorry for the late reply.

Originally posted by XorEaxEax View Post

Are these filters removed once someone validates them against a newer compiler or will they remain in the list indefinitely? Would be interesting to know how old some of these flag filters are which you mentioned.

Some of these filters stay for a very long time, usually uses don't log bugs saying "hey, that old problem I had with O3 no longer happens", so unless a developer has access to a setup similar to the original bug reporter and is in the mood for some cleanup they will stay indefinitely.

Originally posted by XorEaxEax View Post

Any chance you could point me to the bug report for VisualboyAdvance? I did extensive benchmarking tests with this against Mess's gba implementation not that long ago and encountered no problems at all using -O3.

In the case of VisualboyAdvance it seams it wasn't the case of bad code generation but memory usage while compiling, the bug is https://bugs.gentoo.org/show_bug.cgi?id=64670 Also, VisualboyAdvance is only keyworded for the major architectures (amd64, ppc and x86), bad code generation is most likely to happen on less popular architectures like sparc or sh.

Originally posted by XorEaxEax View Post

Yes every compiler has bugs, that's inescapable given the complexity of what they are doing. However I can't say I've encountered many bugs due to -O3 when compiling packages, I have encountered many instances where -O3 generates slower code than -O2 though.

Still, the risk is always present, here is an example of bad code generation on amd64: https://bugs.gentoo.org/show_bug.cgi?id=356087

Originally posted by XorEaxEax View Post

Actually, I've used -ffast-math for things like Blender and it generated indentical output to that of not using it (although I was merely rendering using BI, not using any of the simulation engines), it depends on the precision required by the application. I'd say there are by far fewer programs out there that will break or generate faulty output due to loss of precision than those for which it will.

I never experimented with -ffast-math as i prefer correctness over speed, but here's an interesting blog post about this option http://programerror.com/2009/09/when...ast-math-isnt/ It might be worth checking if programs that enable it by default (like mesa and ffmpeg if I remember correctly) wouldn't benefit from removing -ffast-math
Leave a comment:
XorEaxEax replied

04 December 2011, 08:13 PM
Originally posted by Ansla View Post

This does not necessarily mean if you'll try to build any of these packages with latest gcc and -O3 the resulting binary will be completely broken, the breakage might only occur with older gcc versions, only on some arches, or even only in some corner use cases.

Are these filters removed once someone validates them against a newer compiler or will they remain in the list indefinitely? Would be interesting to know how old some of these flag filters are which you mentioned.

Originally posted by Ansla View Post

The thing is, in order for a flag to be filtered someone must report a bug of the resulting binary not working properly while working when compiling with -O2.

Any chance you could point me to the bug report for VisualboyAdvance? I did extensive benchmarking tests with this against Mess's gba implementation not that long ago and encountered no problems at all using -O3.

Originally posted by Ansla View Post

Long story short, optimization should not theoretically affect what the code does, but every gcc branch has known bugs, and O3 being not officially recommended and not so widely used as O2 will contain even more undiscovered bugs.

Yes every compiler has bugs, that's inescapable given the complexity of what they are doing. However I can't say I've encountered many bugs due to -O3 when compiling packages, I have encountered many instances where -O3 generates slower code than -O2 though.

Originally posted by Ansla View Post

P.S. it might be that locovaca was referring to the effects of -fast-math when saying O3 code will compute 2+2=5, but that's not enabled by O3. only -Ofast enables fast-math that will break almost any program that is not a game or multimedia codec.

Actually, I've used -ffast-math for things like Blender and it generated indentical output to that of not using it (although I was merely rendering using BI, not using any of the simulation engines), it depends on the precision required by the application. I'd say there are by far fewer programs out there that will break or generate faulty output due to loss of precision than those for which it will.
Leave a comment:
XorEaxEax replied

04 December 2011, 07:38 PM
Originally posted by popper View Post

indeed it is when it works for your case, it might be a bit of fun for you and others reading to git pull the current x264 and turn off the assembly flag then look at the code generated from the fall back C routines and compare to the real tried and trusted fully benchmarked assembly routines.....

Certainly hand-optimized assembly made by an expert both on assembly and on the subject at hand (as they certainly are when it comes to video encoding) will beat any compiler. However for us mere mortals, GCC and others likely generate far better vectorization than we could do ourselves.

Originally posted by popper View Post

and im sure Loren merrit , Dark shikari, Ronald S. Bultje, and many other assembly devs over on x264-dev IRC can give you lots of real life broken routine cases if anyone cares to fix them in a given compiler and Ansla's partial list is also interesting.

Perhaps, but I remember the compiler smackdowns over at 'breaking eggs and making omelettes' for ffmpeg which were all done with assembly optimization turned off and the only compiler I recall generating broken code was LLVM which back then was alot less mature than it is now.
Leave a comment:
popper replied

04 December 2011, 02:10 PM
Originally posted by curaga View Post

Re autovectorizing, I recently looked at gcc's generated asm and it did better than what I would've done by hand (I had targeted SSE, and gcc used SSE2 though). Very nice to have the compiler do that successfully.

indeed it is when it works for your case, it might be a bit of fun for you and others reading to git pull the current x264 and turn off the assembly flag then look at the code generated from the fall back C routines and compare to the real tried and trusted fully benchmarked assembly routines.....

and im sure Loren merrit , Dark shikari, Ronald S. Bultje, and many other assembly devs over on x264-dev IRC can give you lots of real life broken routine cases if anyone cares to fix them in a given compiler and Ansla's partial list is also interesting.
Leave a comment:
Ansla replied

04 December 2011, 12:49 PM
Originally posted by XorEaxEax View Post

The fact that -O3 doesn't always generate faster code than -O2 does not mean the code is broken, it just means that the heuristics governing the use of more aggressive optimizations enabled at -O3 sometimes fail to correctly estimate if an optimization will generate faster code and instead actually generates slower code.

This is not a new thing, there are many options like for instance global loop unrolling which is terribly hard to estimate without runtime data and is therefore not enabled by default in compiler optimization levels, same goes for loop vectorization although that is turned on at -O3 on GCC. Compilers have improved alot in this area, but the only way to really know the compiler will make they right choice is to either manually unroll (like in the good ole days) or provide the compiler with runtime data (profile/feedback optimization).

As for -O3 generating faulty code, I haven't come across that in a long time, do you have any fresh examples? And yes, any benchmark which doesn't validate that the results are correct is indeed worthless.

This is a (incomplete) list of Gentoo ebuilds that filter O3 because of invalid code generation:
app-arch/ppmd
app-emulation/xen
dev-ada/asis-gcc
dev-ada/asis-gpl
dev-lang/python
dev-scheme/guile
dev-util/valgrind
app-editors/vim
games-emulation/visualboyadvance
games-fps/duke3d
games-strategy/asc
kde-base/kdm
media-libs/ming
media-sound/gnomad
media-video/kaffeine
sci-electronics/alliance
sci-visualization/opendx
sys-freebsd/freebsd-sbin
sys-fs/evms
sys-libs/libsmbios
sys-process/procps
www-plugins/nspluginwrapper
x11-libs/gtk+

This does not necessarily mean if you'll try to build any of these packages with latest gcc and -O3 the resulting binary will be completely broken, the breakage might only occur with older gcc versions, only on some arches, or even only in some corner use cases. The thing is, in order for a flag to be filtered someone must report a bug of the resulting binary not working properly while working when compiling with -O2.

That doesn't mean -O2 is perfectly safe, there are plenty packages filtering -O2 or even -O1 on some arches but usually when an optimization level generates bad code all the superior levels will generate bad code as well.

Long story short, optimization should not theoretically affect what the code does, but every gcc branch has known bugs, and O3 being not officially recommended and not so widely used as O2 will contain even more undiscovered bugs.

P.S. it might be that locovaca was referring to the effects of -fast-math when saying O3 code will compute 2+2=5, but that's not enabled by O3. only -Ofast enables fast-math that will break almost any program that is not a game or multimedia codec.

Last edited by Ansla; 04 December 2011, 12:56 PM.
Leave a comment:
curaga replied

04 December 2011, 08:09 AM
Re autovectorizing, I recently looked at gcc's generated asm and it did better than what I would've done by hand (I had targeted SSE, and gcc used SSE2 though). Very nice to have the compiler do that successfully.
Leave a comment:
XorEaxEax replied

04 December 2011, 07:39 AM
Originally posted by locovaca View Post

A benchmark that tests code that produces possibly faulty results is worthless. If you ask O2 compiled code what 2+2 is and it says 4 in a half second while the O3 code says 5 in a quarter second, which is better?

The fact that -O3 doesn't always generate faster code than -O2 does not mean the code is broken, it just means that the heuristics governing the use of more aggressive optimizations enabled at -O3 sometimes fail to correctly estimate if an optimization will generate faster code and instead actually generates slower code.

This is not a new thing, there are many options like for instance global loop unrolling which is terribly hard to estimate without runtime data and is therefore not enabled by default in compiler optimization levels, same goes for loop vectorization although that is turned on at -O3 on GCC. Compilers have improved alot in this area, but the only way to really know the compiler will make they right choice is to either manually unroll (like in the good ole days) or provide the compiler with runtime data (profile/feedback optimization).

As for -O3 generating faulty code, I haven't come across that in a long time, do you have any fresh examples? And yes, any benchmark which doesn't validate that the results are correct is indeed worthless.
Leave a comment:

Announcement

Benchmarks Of GCC 4.2 Through GCC 4.7 Compilers

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: