GCC 12 Profile Guided Optimization Benchmarks With The AMD Threadripper 3990X

Written by Michael Larabel in GNU on 2 August 2022 at 02:00 PM EDT. 40 Comments
GNU --
Last month I ran a number of GCC 12 compiler optimization benchmarks for this latest-stable compiler atop an AMD Ryzen Threadripper 3990X workstation. Those tests included various optimization levels as well as link-time optimizations (LTO). Some Phoronix Premium supporters also requested to see some fresh GCC 12 Profile Guided Optimization (PGO) benchmarks, so here in this article are those PGO benchmark results.

Compiler Profile Guided Optimizations (PGO) rely first on having collected a profile for the code-base being built during program execution and utilizing that collected run-time usage data to feed back into the compiler on a subsequent build in order to make better optimization decisions. PGO can be quite beneficial assuming an accurate profile is collected for the real-world use of the given software and not too much variation in the code paths taken.

With these benchmarks it was first running all the benchmarks with "-O3 -march=native -flto" for the baseline numbers, rebuilding all of the benchmarks with PGO support enabled for profile generation and repeating the tests while ignoring those numbers during the profile generation, and then lastly is rebuilding all of the benchmarks making use of the collected PGO profile for each benchmark and repeating the benchmarks for seeing the performance benefit out of PGO on GCC 12.

Aside from switching out the PGO-related CFLAGS/CXXFLAGS, across the board "-O3 -march=native -flto" was used for the compiler flags for a look at the aggressive compiler optimizations on the Ryzen Threadripper 3990X test system with GCC 12 on Fedora Workstation 36.

Similar to past PGO compiler benchmarks on Phoronix, profile guided optimizations can be beneficial for further enhancing the performance of the software under test -- assuming you are able to collect a sufficient profile, etc.

The advantages of PGO can vary greatly though depending upon the profile and particular code-base, so in not all cases is it worthwhile especially due to the additional time needed to generate a profile and rebuild.

Only in a few cases was the PGO-optimized binaries slower.

So if you are after squeezing out extra performance and have already carried out higher optimization levels, -march=native, and LTO tuning, PGO is another avenue worth pursuing that will help some performance-sensitive workloads if you don't mind the more involved build/profiling process.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week