Announcement

**anarki2** · 07 January 2020, 05:58 PM

Originally posted by Michael View Post

Ubuntu Mainline PPA most often aside from when I am bisecting or needing to patch my kernels.

Thanks! Any noticeable improvements? Is it a must-have perf upgrade for the 3960X, or am I fine with 5.3?

**bug77** · 07 January 2020, 06:12 PM

So basically, it can sway significantly one way or another, but most of the time the difference is insignificant. And how many lines of code were added to gcc for this "feat"?

**archsway** · 07 January 2020, 06:26 PM

Mesa seems to benefit quite a bit from LTO.

Add this to the meson command:

Code:

-Db_lto=true

**nuetzel** · 07 January 2020, 06:42 PM

Originally posted by archsway View Post

Mesa seems to benefit quite a bit from LTO.

Add this to the meson command:

Code:

-Db_lto=true

Yes, but mostly with reduced size. No (dramatic) speed increase.
`-fwhole-program` isn't worth it, it seems.

**zerodefect** · 07 January 2020, 06:42 PM

From the gcc docs:

-fwhole-program
Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.

This option should not be used in combination with -flto. Instead relying on a linker plugin should provide safer and more precise information.

**Crusader: No Refunds** · 07 January 2020, 09:28 PM

As far as I know, -fwhole-program works best when the entire program fits into a single source file/compilation unit. In theory, a benchmark like Himeno should benefit from it, but the results obtained here show otherwise. Odd.

Instead of using -flto in conjunction with -fwhole-program, I would suggest replacing -fwhole-program with -flto-partition=none (possible values are none/one/1to1/balanced/max), which would disable WHOPR/partitioned LTO and switch to full LTO.

I suppose one could also accelerate the benchmarks measuring compilation time by using -flto=n, which would parallelize the linking process when using WHOPR.

**Paradigm Shifter** · 08 January 2020, 01:12 AM

Since mostly what I care about for absolute roaring speed is FFTW on 512-2048 FFT sizes... this doesn't look like it's going to help me much. It might at 2048 I guess... I need to run some benchmarks of my own.

Thanks for the tests.

**carewolf** · 08 January 2020, 04:22 AM

For compile time you need -flto=$CPUCOUNT (or -flto=jobserver and CC=-gcc)

Fat objects should also already be disabled. So the main difference is the compiling in parallel. It shouldn't be that much slower than normal building, it just uses a metric shit-ton more memory.

**carewolf** · 08 January 2020, 04:28 AM

Originally posted by set135

This is what I have been using on Gentoo for many years, for all but a few packages:
CFLAGS=-march=native -O2 -pipe -fno-stack-protector -flto=4 -fuse-linker-plugin
CXXFLAGS=$CFLAGS
LDFLAGS=-Wl,-flto=4 $CFLAGS

My goal was primarily to reduce executable size, and just to see how it works, so it is interesting to see some benchmarks.

Why pass -flto to the linker? Just link with gcc/g++, and let it deal with the command line. Also -fuse-linker-plugin is redundant. But yes, that will improve binary size greatly, even if you would need -O3 to get the performance benefits of -flto.

**Grinch** · 08 January 2020, 05:49 AM

Originally posted by archsway View Post

Mesa seems to benefit quite a bit from LTO.

Add this to the meson command:

Code:

-Db_lto=true

Yes that would be an interesting benchmark, there's also MESA support for PGO (profile guided optimization) which in my experience is typically a more impactful optmization. The variable is -Db_pgo= and the parameters are off/generate/use . Perhaps something for Michael to try out when he does a new PGO benchmark.

Announcement

GCC 10 Link-Time Optimization Benchmarks On AMD Threadripper

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment