Zen 3 GCC Tuning Continues With More Correct Latencies Rather Than "Random Numbers"
On Monday, the AMD EPYC 7003 "Milan" launch day, we finally got to see some serious tuning begin for the Zen 3 "Znver3" CPU target in the GCC compiler after that initial code landed at the end of last year. Yesterday a second Zen 3 tuning patch was published and then today a third tuning patch has made it out.
This third Znver3 tuning patch out today is again from SUSE's Jan Hubicka. He sent it out on the mailing list and right away merged it as a "fix" for the GCC 11 compiler release that will debut as stable in the next month or so as GCC 11.1
This third patch is a small change but is updating the costs of integer divides to match the actual latencies found with Zen 3 processors.
While this change may lead to disabling vectorization in some cases, Hubicka noted with the patch, "in general it is better to have actual latencies than random numbers."
The idiv costs were lowered from the existing values of 16/22/30/45/45 now to 9/10/12/17/17 so the compiler is able to make better decisions for the generated instructions with regards to the instructions cost (latency). Those previous costs aren't entirely "random numbers" as they were carried forward from the Znver2 cost table for Zen 2 but not accurate for AMD's latest microarchitecture. Whether the Znver2 costs are accurate for Zen 2 is another matter.
It will be interesting to see how much more AMD Zen 3 tuning happens in time for the GCC 11.1 stable release. But even with GCC 11.1 being just over a month or so away from release, this updated GNU compiler won't be found out-of-the-box on Ubuntu until 21.10 this autumn while one of the earliest users of it will be Fedora 34. Thus my desire for seeing more timely compiler support out of AMD for the open-source GCC and LLVM/Clang compilers continues as one of the areas where Intel tends to be months or years ahead of schedule. Had this Znver3 tuning been all squared away ahead of time, Linux enthusiasts (especially those on the likes of Arch and Gentoo) could have been banging on the code for a while already with Ryzen 5000 series hardware to help ensure it's fit and optimal ahead of hitting AMD HPC customers and other prominent deployments - the same case as if AOCC 3.0 had been available or as a public beta closer to the Ryzen 5000 series launch.
This third Znver3 tuning patch out today is again from SUSE's Jan Hubicka. He sent it out on the mailing list and right away merged it as a "fix" for the GCC 11 compiler release that will debut as stable in the next month or so as GCC 11.1
This third patch is a small change but is updating the costs of integer divides to match the actual latencies found with Zen 3 processors.
While this change may lead to disabling vectorization in some cases, Hubicka noted with the patch, "in general it is better to have actual latencies than random numbers."
The idiv costs were lowered from the existing values of 16/22/30/45/45 now to 9/10/12/17/17 so the compiler is able to make better decisions for the generated instructions with regards to the instructions cost (latency). Those previous costs aren't entirely "random numbers" as they were carried forward from the Znver2 cost table for Zen 2 but not accurate for AMD's latest microarchitecture. Whether the Znver2 costs are accurate for Zen 2 is another matter.
It will be interesting to see how much more AMD Zen 3 tuning happens in time for the GCC 11.1 stable release. But even with GCC 11.1 being just over a month or so away from release, this updated GNU compiler won't be found out-of-the-box on Ubuntu until 21.10 this autumn while one of the earliest users of it will be Fedora 34. Thus my desire for seeing more timely compiler support out of AMD for the open-source GCC and LLVM/Clang compilers continues as one of the areas where Intel tends to be months or years ahead of schedule. Had this Znver3 tuning been all squared away ahead of time, Linux enthusiasts (especially those on the likes of Arch and Gentoo) could have been banging on the code for a while already with Ryzen 5000 series hardware to help ensure it's fit and optimal ahead of hitting AMD HPC customers and other prominent deployments - the same case as if AOCC 3.0 had been available or as a public beta closer to the Ryzen 5000 series launch.
6 Comments