LLVM Clang 15 Enables Faster Square Root Instructions For AMD Zen
As part of an effort to update LLVM Clang's "-mtune" handling to cater to newer processors, AMD Zen processors with LLVM/Clang 15 later this year will be able to enjoy faster and more accurate square root calculations with tuning to use SQRTSS/SQRTPS instructions.
Merged today to mainline for LLVM/Clang 15 (not to be confused with the branched LLVM/Clang 14 releasing next month) is fast SQRTSS (Compute Square Root of Scalar Single-Precision Value) / SQRTPS (Square Root of Single-Precision Floating-Point Values) tuning for AMD Zen processor cores. With Zen 1 and newer those instructions are found to be fast enough and worthwhile than the existing code path while also being more accurate.
This tuning for AMD Zen comes while on the Intel side they already enabled TuningFastScalarFSQRT going back to Sandy Bridge and the TuningFastVectorFSQRT has been in place since Skylake. While this LLVM tuning change affects all Zen CPUs going back to Zen 1, the LLVM change is only happening now in 2022.
This square root instruction tuning for AMD Zen came up as part of a broader discussion for improving the -mtune generic behavior for more modern CPUs, similar to GCC's -mtune default catering to Haswell. As noted in that discussion, "znver1/znver2 schedule models are, well, leave a lot to be desired." Sadly, there isn't as much aggressive AMD compiler tuning by LLVM (and GCC) as there is on the Intel side.
Zen 1 is already a half-decade old while this change for LLVM/Clang 15 will be out as stable around September 2022. Sadly this change is just another example of AMD software optimizations coming in late (and often times left up to independent parties / the open-source community), especially on the compiler side while Intel generally is very early in their new CPU family targeting and ensuring they are well optimized with accurate cost tables, able to make use of new instructions, etc.
This SQRTSS/SQRTPS tuning for Zen is the first AMD Zen specific activity for LLVM going back to last September. Hopefully we'll see more AMD open-source compiler tuning happen this year -- we still haven't seen znver4 introduced yet while Intel started their Alder Lake and Sapphire Rapids compiler patchwork back in mid-2020.
Merged today to mainline for LLVM/Clang 15 (not to be confused with the branched LLVM/Clang 14 releasing next month) is fast SQRTSS (Compute Square Root of Scalar Single-Precision Value) / SQRTPS (Square Root of Single-Precision Floating-Point Values) tuning for AMD Zen processor cores. With Zen 1 and newer those instructions are found to be fast enough and worthwhile than the existing code path while also being more accurate.
This tuning for AMD Zen comes while on the Intel side they already enabled TuningFastScalarFSQRT going back to Sandy Bridge and the TuningFastVectorFSQRT has been in place since Skylake. While this LLVM tuning change affects all Zen CPUs going back to Zen 1, the LLVM change is only happening now in 2022.
This square root instruction tuning for AMD Zen came up as part of a broader discussion for improving the -mtune generic behavior for more modern CPUs, similar to GCC's -mtune default catering to Haswell. As noted in that discussion, "znver1/znver2 schedule models are, well, leave a lot to be desired." Sadly, there isn't as much aggressive AMD compiler tuning by LLVM (and GCC) as there is on the Intel side.
Zen 1 is already a half-decade old while this change for LLVM/Clang 15 will be out as stable around September 2022. Sadly this change is just another example of AMD software optimizations coming in late (and often times left up to independent parties / the open-source community), especially on the compiler side while Intel generally is very early in their new CPU family targeting and ensuring they are well optimized with accurate cost tables, able to make use of new instructions, etc.
This SQRTSS/SQRTPS tuning for Zen is the first AMD Zen specific activity for LLVM going back to last September. Hopefully we'll see more AMD open-source compiler tuning happen this year -- we still haven't seen znver4 introduced yet while Intel started their Alder Lake and Sapphire Rapids compiler patchwork back in mid-2020.
27 Comments