Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance
Longtime Arm engineer Wilco Dijkstra has landed the SVE-optimized memcpy implementation for Glibc. Wilco explained, "Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly."
Arm SVE (and now Scalable Matrix Extensions, SME) is the next-generation SIMD with capabilities beyond Arm's Neon. SVE is aimed at better HPC and machine learning performance for AArch64. SVE supports scalable vectors, speculative vectorization, gather-load and scatter-store, and other capabilities compared to Neon.
Arm Neoverse-V1 with SVE.
The Neoverse N2 and V1 CPUs are among the first Arm CPUs with SVE -- including the recently launched Amazon Graviton3 CPUs with their Neoverse-V1 CPUs. Coincidentally I'll have up some Arm SVE compiler benchmarks on Phoronix in a few days.
Outside of the Arm SVE space, there were also some minor x86/x86_64 optimizations also merged this week into the GNU C Library. See the Glibc Git for the latest changes to this widely-used C library.