Announcement

**birdie** · 12 January 2019, 10:18 AM

The results show that you cannot blindly throw the same GCC compilation options at all your applications and you have to choose optimizations on a per-app basis which is still a huge nuisance for Gentoo lovers. And then we have -flto and PGO to rub salt into the wound.

**birdie** · 12 January 2019, 11:46 AM

Originally posted by tichun

What about -Os?

As for x86 desktop CPUs with a lot of cache the code generated with -Os is almost always significantly slower than the one compiled with -O2. Back in Pentium 3 days and earlier I used -Os when RAM was limited and expensive. Nowadays, this option makes sense only for embedded/memory constrained devices.

Here are some recent -O2 vs -Os results: https://rv8.io/bench

And here's GCC developers attitude towards using -Os:

Code:

First let me put into some perspective on -Os usage and some history:
1) -Os is not useful for non-embedded users
2) the embedded folks really need the smallest code possible and
usually will be willing to afford the performance hit
3) -Os was a mistake for Apple to use in the first place; they used it
and then GCC got better for PowerPC to use the string instructions
which is why -Oz was added :)
4) -Os is used heavily by the arm/thumb2 folks in bare metal applications.

Michael ran -Os tests not so long ago.

**CochainComplex** · 12 January 2019, 11:51 AM

This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

**birdie** · 12 January 2019, 12:08 PM

Originally posted by CochainComplex View Post

This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

-O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.

For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.

It looks like there are too many theorists on Phoronix who've never compiled anything.

**wagaf** · 12 January 2019, 12:11 PM

Originally posted by CochainComplex View Post

This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

It tells the compiler to use instruction and optimization presets for the local build machine.

**Michael** · 12 January 2019, 12:23 PM

Originally posted by wagaf View Post

Distributions can't use march=native since it enables the use of instructions not supported by every chip (like AVX 2 or AVX 512)

It tells the compiler to use instruction and optimization presets for the local build machine.

Indeed though distros could use FMV like Clear Linux does to offer optimal code path for CPU at run-time.

**wagaf** · 12 January 2019, 12:29 PM

Originally posted by birdie View Post

-O3 makes the resulting code bloated and often makes it run slower. You just cannot use this flag for all applications.

I've not recently encountered a case where O3 was running significantly slower than O2 or was causing any issue that was not a bug in the program (notice no regressions with O3 in the benchmarks, only performance improvements).

Looking at the generated machine code I noticed GCC improved in the last few years and is now closer to Clang for generating "clean" code.

I now compile all my code with O3 for release builds.

Originally posted by birdie View Post

For some applications the choice of optimizations flags doesn't even matter - see the x264 example in the article.

Mostly because x264 has heavy assembly optimizations. But not a reason to not use O3 right ?

**AsuMagic** · 12 January 2019, 12:35 PM

Originally posted by CochainComplex View Post

This shows how good -O3 plus march=native can be... Great potantial still not used for the most distributions

... and to add to what others said, `-march=native` implies `-mtune=native`, which will tune the program to run faster on your CPU (without implying newer CPU features are used on its own). As such compiling with `-march=native` will be not only incompatible with all CPUs that do not support exactly your CPU's instruction set *and* it will be less optimized for other machines too.

**birdie** · 12 January 2019, 01:21 PM

Originally posted by wagaf View Post

I've not recently encountered a case where O3 was running significantly slower than O2 or was causing any issue that was not a bug in the program (notice no regressions with O3 in the benchmarks, only performance improvements).

Looking at the generated machine code I noticed GCC improved in the last few years and is now closer to Clang for generating "clean" code.

I now compile all my code with O3 for release builds.

I'm pretty sure you haven't actually tested more than a couple of applications with -O3 vs -O2 which renders your statement kinda superficial and overly-optimistic.

Announcement

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

GCC 9 Compiler Tuning Benchmarks At Various Optimization Levels, Vectorize Options

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment