Intel's Glibc Non-Temporal Stores Memset Optimization Extended To AMD CPUs
Merged last month to the GNU C Library (glibc) Git code was a new tunable for non-temporal stores for memset. This optimization for glibc's memset performance was limited to Intel processors given at the time it was only tested/benchmarked on Intel CPUs but now it's proven to be useful too for AMD processors.
Intel toolchain engineer Noah Goldstein last month introduced this "glibc.cpu.x86_memset_non_temporal_threshold" tunable for setting the threshold for non-temporal store in memset. The x86_memset_non_temporal_threshold documentation explains:
This memset non-temporal tunable was artificially limited to just on Intel processors given that is where it was tested and found to be of performance benefit. After all, it was an Intel engineer leading the change.
Merged on Monday to Glibc Git though is now extending this tunable for AMD processors. Fastly's Joe Damato did the testing and found that this is beneficial for AMD processors. Benchmarks have shown the non-temporal memset is beneficial for AMD processors in tests carried out across Zen 2, Zen 3, and Zen 4 hardware. The data for those interested can be found via this Google Docs spreadsheet for the various AMD Zen CPUs as well as the Intel numbers.
This commit now in Glibc allows for this tunable to work on AMD platforms.
Intel toolchain engineer Noah Goldstein last month introduced this "glibc.cpu.x86_memset_non_temporal_threshold" tunable for setting the threshold for non-temporal store in memset. The x86_memset_non_temporal_threshold documentation explains:
"The glibc.cpu.x86_memset_non_temporal_threshold tunable allows the user to set threshold in bytes for non temporal store in memset. Non temporal stores give a hint to the hardware to move data directly to memory without displacing other data from the cache. This tunable is used by some platforms to determine when to use non temporal stores memset."
This memset non-temporal tunable was artificially limited to just on Intel processors given that is where it was tested and found to be of performance benefit. After all, it was an Intel engineer leading the change.
Merged on Monday to Glibc Git though is now extending this tunable for AMD processors. Fastly's Joe Damato did the testing and found that this is beneficial for AMD processors. Benchmarks have shown the non-temporal memset is beneficial for AMD processors in tests carried out across Zen 2, Zen 3, and Zen 4 hardware. The data for those interested can be found via this Google Docs spreadsheet for the various AMD Zen CPUs as well as the Intel numbers.
This commit now in Glibc allows for this tunable to work on AMD platforms.
16 Comments