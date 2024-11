"The following adds X86_TUNE_AVX512_TWO_EPILOGUES tuning and directs the vectorizer to produce both a vector AVX2 and SSE epilogue for AVX512 vectorized loops when set. The tuning is enabled by default for Zen4 and Zen5 where I benchmarked it to be overall positive on SPEC CPU 2017 both in performance and overall code size. In particular it speeds up 525.x264_r which with only an AVX2 epilogue ends up in unvectorized code at the moment."

Merged today for the upcoming GCC 15 stable release is a new "X86_TUNE_AVX512_TWO_EPILOGUES" tuning optimization that is enabled by default for AMD Zen 4 and Zen 5 processors.SUSE compiler engineer Richard Biener wrote the patch adding this "X86_TUNE_AVX512_TWO_EPILOGUES" tuning and its default enabling when targeting either AMD Zen 4 or AMD Zen 5 processors. Biener explains in the now committed patch:No firm numbers from SPEC CPU 2017 or any other benchmarks were shared for helping to quantify the actual performance impact of this additional AMD Zen 5/4 tuning.

With the patch now in Git it will be part of the upcoming GCC 15.1 stable release due out in the early months of 2025.