The Performance Between GCC Optimization Levels

elanthis replied

16 October 2012, 01:03 PM
Originally posted by bobwya View Post

I doubt a combination of carefully written C and handcrafted assembler is going to benefit very much from additional pseudo-smart Compiler heuristics...

Bob

The idea that there is some universally correct hand-crafted C or assembler is just ridiculous today. Even different families of Intel's own CPUs need very different optimization strategies in order to maximize performance on that particular CPU. Then take into account the fact that Linux runs on dozens of different CPU architectures. Then remember that Linux is millions of lines of code and human beings are not capable of seeing or thinking about more than a tiny localized fraction of a large codebase at any given moment in time.
Leave a comment:
ryao replied

16 October 2012, 01:28 AM
Originally posted by [Knuckles] View Post

Thanks for the article.

+1 to also providing the final binary sizes, would be interesting.

As for testing with -march=native, of course it should provide better results, but these benchmarks are very interesting also as a baseline for what you get if you are planning to distribute binaries.

That is why I suggested that these tests should have been both with and without -march=native.
Leave a comment:
Walldorf2000 replied

15 October 2012, 09:00 AM
What is used by distributions, e.g. Ubuntu

What is used by distributions, e.g. Ubuntu?

Do they differentiate between the different packages?

I hope this is not a dull question.
Leave a comment:
[Knuckles] replied

15 October 2012, 07:47 AM
Thanks for the article.

+1 to also providing the final binary sizes, would be interesting.

As for testing with -march=native, of course it should provide better results, but these benchmarks are very interesting also as a baseline for what you get if you are planning to distribute binaries.
Leave a comment:
smitty3268 replied

14 October 2012, 11:53 PM
Originally posted by DaemonFC View Post

When you compile Mozilla software with -O3, you will get much larger binary size, which actually can make it take longer to load, and make the resulting program take up more space in RAM. I think Mozilla recommends -O2, but I've seen where some distributions use -Os, which doesn't make the binaries much smaller, but can hurt Firefox's score on things like Sunspider or Google's V8 benchmark. (-O3 doesn't help it enough to be worth the cost in load times and additional RAM usage)

Firefox used to use -Os, but when they updated their builds to use modern GCC they also switched to using -O3. They do limit the total amount of code inlining which can be done, though, which keeps the binary size from getting too large. And they turned on pgo as well.
Leave a comment:
ssam replied

14 October 2012, 09:41 AM
if you use -march=native then the compiler will know things like the cache sizes. this means -O3 can make better decisions about speed/size trade off.

Code:

gcc -march=native -E -v - </dev/null 2>&1 | grep cc1

i remember (but cant find) and article saying that the large and clever caches in modern CPUs make -Os less useful.

Also with the -Ofast did you check the correctness of the programs. it will do things like turn 'x/100' into 'x*0.01', sometimes this is harmless, but some algorithms are very sensitive to this. ( http://gcc.godbolt.org/ is quite good for seeing what an optimisation will actually do )
Leave a comment:
ryao replied

14 October 2012, 09:28 AM
Originally posted by 4d4c47 View Post

scriptkernel-x.x.x.sh= BFS + BFQ + CFLAG -march=native -Ofast

http://sourceforge.net/projects/scriptkernel/files/

scriptgcc-4.7.2_UBUNTU12_64BITS.sh = script compile source code gcc-4.7.2 automatic then ubuntu 12.04+

http://sourceforge.net/projects/scri...TS.sh/download

...

-Ofast is the equivalent of -O3 -ffast-math, which is the equivalent of -O3 when compiling software that lacks floating point arithmetic. The kernel doesn't use floating point arithmetic, so there is no point to enabling that flag.

Note that there might be some rare instances in which it does floating point arithmetic, but the kernel developers are quite adamant about avoiding it. Using it would have performance penalties. Furthermore, if it does use floating point arithmetic in those instances, -ffast-math could be a great way to break that code, possibly causing kernel panics.

By the way, if you want a faster computer, I suggest using ZFS. I am running Gentoo Linux on a ZFS rootfs on my desktop and it is virtually lag free. ZFS has its own IO elevator, so there is no need for the BFQ. Furthermore, I am using the CFS with the autogroups. I have found no need for BFS.

zfs-overlay/zfs-install at master · ryao/zfs-overlay

https://github.com/ryao/zfs-overlay/blob/master/zfs-install

Development Overlay for ZFS Support in Gentoo Linux - ryao/zfs-overlay

Last edited by ryao; 14 October 2012, 09:32 AM.
Leave a comment:
curaga replied

14 October 2012, 06:19 AM
There are also kernel-specific LTO things that could be done, as the guy that posted his paper in the kernel LTO topic said.

His thesis was on LTOing a 2.4 kernel, but it did more than just remove dead code: it moved executed-once code to the .init section, saving runtime RAM for example.
Leave a comment:
XorEaxEax replied

14 October 2012, 04:50 AM
Originally posted by mayankleoboy1 View Post

the GCC4.7 optimisation guide specifically says that using -O3 is not recommended over -O2. And that O3 was faster 'in the past' , but is now not faster than -O2.

Is it OK to use -O3 to build the linux kernel ?

Yes, but in Linux's case I doubt it will make for a 'perceivable' difference. Part of it is that the kernel is of course extremely 'low latency' by design, another is that the devs make use of compiler extensions which allows them better control of the generated code, overriding the optimization heuristcs of the compiler. Also things that are computationally intense like hashing algorithms have hand-written assembly versions which obviously can't be optimized by the compiler.

We are seeing work being done on using LTO (link time optimization) when compiling the kernel which could potentially yield slightly better performance, mainly because code tends to become quite a bit smaller with this optimization which could decrease cache trashing, but also because it allows the compiler to view the entire source code as 'one entity' which likely opens up possibilities in optimizations like code reorganizing/reuse and of course dead code removal.
Leave a comment:
XorEaxEax replied

14 October 2012, 04:35 AM
Originally posted by ryao View Post

It would be useful if we could see the effect that cache has on optimization levels. Small caches are generally thought to favor lower optimization levels. In particular, -Os and -O2.

Only when the compiler heuristics fail, there's nothing that says -O3 'has' to use all available optimizations and thus bloat code resulting in cache trashing and a possible net performance loss.

Originally posted by ryao View Post

-O2 -march=native is generally considered to be optimal outside of special cases.

Not 'optimal', rather the 'safe' choice as some of the more aggressive optimization enabled at -O3 and above which can yield great performance increases can also backfire due to the difficulty of gauging their effectiveness in relation to cost at compile time.

However there is a solution to this problem, profile-guided optimization. Of all the tests I've done over the past two years I can't recall one situation where -O3 with PGO did not outperform or in the worst case scenario match any of the lower optimization levels.

Obviously this is because the profile data gives the compiler runtime information (hot/cold codepaths, cache usage, loop iterations etc) from which to determine when and where to apply optimizations which is a huge benefit compared to making 'educated guesses' at compile time.
Leave a comment:

Announcement

The Performance Between GCC Optimization Levels

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: