GNU C Library Tuning For AArch64 Helps Memset Performance By ~24%
A patch merged yesterday to the GNU C Library (glibc) codebase can help the memset() function's performance by 24% as measured on an Arm Neoverse-N1 core.
Wilco Dijkstra of Arm has landed a memset optimization for the AArch64 code within the GNU C Library. Wilco explains in the patch adjusting the hand-tuned Assembly code:
It will be interesting to see the memset performance impact of this optimization on other Arm cores as well.
The Neoverse-N1 is what's found in the Ampere Altra / Ampere Altra Max servers among other SoCs and thus will be nice to see this optimization rolling out in the next Glibc release. That next release will be Glibc 2.41 and should be out around February.
Wilco Dijkstra of Arm has landed a memset optimization for the AArch64 code within the GNU C Library. Wilco explains in the patch adjusting the hand-tuned Assembly code:
"Improve small memsets by avoiding branches and use overlapping stores. Use DC ZVA for copies over 128 bytes. Remove unnecessary code for ZVA sizes other than 64 and 128. Performance of random memset benchmark improves by 24% on Neoverse N1."
It will be interesting to see the memset performance impact of this optimization on other Arm cores as well.
The Neoverse-N1 is what's found in the Ampere Altra / Ampere Altra Max servers among other SoCs and thus will be nice to see this optimization rolling out in the next Glibc release. That next release will be Glibc 2.41 and should be out around February.
9 Comments