GCC 10 Link-Time Optimization Benchmarks On AMD Threadripper
Stemming from the recent news in Fedora 32 potentially LTO'ing packages by default for better performance and not yet having checked on the Link-Time Optimization performance of the in-development GCC 10, here is a fresh look at the possible performance gains from making use of link-time optimizations for generating faster binaries. This round of testing was done on the AMD Ryzen Threadripper 3960X and is complementary to the recent Profile Guided Optimization benchmarks.
This round of testing was on the Ryzen Threadripper 3960X while running Ubuntu 19.10 and with the Linux 5.4 kernel. GCC 10.0 as of December was used for testing as the newest snapshot at the time of testing and building the compiler in release mode.
GCC 10 was used to build a variety of C/C++ software packages with the Phoronix Test Suite. The base round of testing was done when setting "-O3 -march=native" for the base metrics, then testing link-time optimizations with "-O3 -march=native -flto", and then lastly a run with "-O3 -march=native -flto -fwhole-program." Per the GCC documentation on the whole-program option, "Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers."