A Fresh Look At The PGO Performance With GCC 8
It's been a while since we last ran some GCC PGO benchmarks, the Profile Guided Optimizations or feedback-directed optimization technique that makes use of profiling data at run-time to improve performance of re-compiled binaries. Here are some fresh benchmarks of GCC PGO impact on a Xeon Scalable server while using the newly-released GCC 8.2 release candidate.
With it being a while since our last roundabout with GCC PGO benchmarking and also a reader recently inquiring about PTS PGO testing, I ran some new tests. For those not familiar with PGO, it basically involves first compiling the code with the relevant PGO/profiling flags, running the workload under test to generate the profiling data, and then re-compiling the software while feeding that profiling data into the compiler so it can make better optimization choices. This profile-guided feedback can be quite beneficial to the compiler for making wiser code generation choices based upon that run-time data. Firefox, Chrome, and other popular software packages have been relying upon PGO-optimized release binaries for a while to offer greater performance.
Following the inquiry around PGO testing with the Phoronix Test Suite, I got around to upstreaming a PGO module I had written previously that makes it effortless to conduct this testing. Thanks to the highly structured manner of these automated benchmarks and how the test profiles and framework are designed, it's quite simple to make the module automating the PGO-ing process in a very easy manner. Thus for the upcoming Phoronix Test Suite 8.2.0 release will be this PGO module currently in Git. In this small code addition to the Phoronix Test Suite, it achieves the PGO benchmarking process via:
- First doing a clean build of all the desired software under test with whatever (if any) CFLAGS/CXXFLAGS/PATH are set by the user, in order to collect the results without any PGO optimizations.
- Following that it proceeds to re-compile all of the desired test profiles while adding in the relevant -fprofile-generate/-fprofile-dir options to the compiler flags. It also forces any test profiles using concurrent make jobs to instead rely upon a single job, as it's reported to work out better for PGO'ing.
- After rebuilding the tests with the PGO generation flags, it runs each of the tests one time while having a unique directory per-test-profile for the collection of the profile data. Running each test profile once is enough for the profiling rather than where test profiles commonly run 3+ times when collecting benchmark results to ensure statistical accuracy.
- Lastly it then re-compiles the test profiles in the PGO mode and making use of each test profile's respective profile cache. Following that it does a standard PTS run of all the contained benchmarks for ultimately producing a side-by-side showing of the PGO impact with the given compiler.
This new module works with any of the hundreds of test profiles, obviously assuming a PGO-supported compiler on the system, the software under test supports CFLAGS/CXXFLAGS, the software doesn't have any PGO issues, etc. This module is activated to carry out the above process by just running "phoronix-test-suite pgo.benchmark [the desired tests/suites/result-files]" to carry out this automated benchmarking process (just using the pgo.benchmark sub-command will activate the module's behavior).
With that said, for those curious about the current PGO performance impact with GCC 8.2 RC1 I ran some tests on Ubuntu 18.04 LTS with a dual Intel Xeon Gold 6138 platform in a Tyan 1U server. The CFLAGS/CXXFLAGS were set to "-O3 -march=native" for the entire duration of testing. All of these tests were carried out using the Phoronix Test Suite 8.2 Git code.