Six Years After Launch, AMD Piledriver CPU Tuning Gets Reworked In LLVM Clang
Piledriver cores ended up a range of CPUs from the FX-8300 series through the FX-9590, many APUs including the A10-6800K, more than a dozen mobile parts, and also some Opteron CPUs. Piledriver as a reminder was based on a 32nm SOI process, offered better IPC over the original Bulldozer microarchitecture, bumped the clock speeds, and other incremental improvements.
This 220 Watt beast might now run faster on Linux...
Roman Lebedev, a Darktable software developer, took to optimizing LLVM Clang with a focus on speeding up the Piledriver CPU performance in handling the open-source RAW photography software's image decoding speed. With tweaking the Piledriver/bdver2 scheduler mode, he got the generated code performance to improve by about 1% while in the most significant test cases it was up to 7% faster.
The revised Piledriver scheduler model is now in LLVM master for the next release that will be LLVM/Clang 8.0 in early 2019.
For all the older AMD systems out there with Piledriver cores, the newer LLVM Clang compiler may generate more optimal code when using the "bdver2" targeting.
What also makes this scheduling work interesting is that he was able to optimize the model by making use of new LLVM tooling like llvm-exegesis for helping to profile the host machine's instruction characteristics using the system's performance counters. This new profiling/benchmarking means is rather than traditionally relying upon the data provided by the CPU vendor about optimal CPU instruction characteristics or profiling by hand. Hopefully this approach will help optimize other CPU scheduler models for LLVM/Clang moving forward especially as LLVM's vast collection of tooling continues to mature.