More AMD Zen 4 Compiler Tuning Work Lands In GCC 13
Since December the longtime GNU Compiler Collection developer had been volleying various Zen 4 tuning patches for GCC 13, making up for the rather basic Zen 4 compiler support contributed by AMD in October that basically carried over the Zen 3 target and enabled the new CPU ISA extensions found with Zen 4.
Over the past few weeks we've seen several rounds of Zen 4 tuning by Hubicka to squeeze into what will be the GCC 13.1 stable release. On Monday the latest patch was merged:
this patch adds more tunes for zen4:
- new tunes for avx512 scater instructions. In micro benchmarks these seems consistent loss compared to open-coded coe
- disable use of gather for zen4. While these are win for a micro benchmarks (based on TSVC), enabling gather is a loss for parest. So for now it seems safe to keep it off.
- disable pass to avoid FMA chains for znver4 since fmadd was optimized and does not seem to cause regressions.
Once GCC 13 is primed for release it will be interesting to see how this AMD Zen 4 tuning compares to AMD's AOCC 4.0 compiler as a downstream of LLVM/Clang. AMD did upstream their initial Zen 4 enablement to LLVM/Clang that landed in early December but as of writing there have been no follow-up tuning patches there yet.
With the next round of compiler releases, -march=znver4 can be used if wanting to cater the compiler's instructions and optimizations to the AMD Ryzen 7000 series and AMD EPYC 9004 series processors.