GNU C Library Dropping Various SSSE3 Optimized Code Paths

Written by Michael Larabel in GNU on 16 April 2022 at 06:03 AM EDT. 42 Comments
GNU
The latest GNU C Library (Glibc) development code this week has begun dropping various SSSE3 optimized code paths.

Supplemental Streaming SIMD Extensions 3 (SSSE3) dates back more than a decade to the Intel Xeon 5100 / Core 2 days or AMD Bobcat/Bulldozer as an iteration of SSE. But with Glibc also carrying optimized code paths for older SSE2 or SSE4.1 from around the same time as SSSE3, plus AVX2 and EVEX code paths for newer Intel/AMD CPUs, the SSSE3 code paths are being phased out.

Glibc developers determined it's no longer worth it shipping SSSE3 optimized code paths given the SSE2 / SSE4.1 / AVX2 / EVEX code paths also in existence that few Intel/AMD CPUs are left to the SSSE3 route. The code size cost of carrying SSSE3 is measurable and as of this week the developers have begun removing it.


The Xeon 5100 series introduced SSSE3 support. Picture from my 5100 "Woodcrest" Linux testing back in 2006.


Among the SSSE3 removals are dropping mem{move|cpy}-ssse3-back, str{p}{n}cpy-ssse3, str{n}cat-ssse3, str{n}{case}cmp-ssse3, and {w}memcmp-ssse3 code paths.


SSSE3 was useful in the Core 2 days but for CPUs of the past number of years, the Glibc AVX2 code paths are more beneficial.


There is also a reduction in the SSSE3 code around memmove/mempcpy/memcpy. The commit explains:
The goal is to remove most SSSE3 function as SSE4, AVX2, and EVEX are generally preferable. memcpy/memmove is one exception where avoiding unaligned loads with `palignr` is important for some targets.

This commit replaces memmove-ssse3 with a better optimized are lower code footprint verion. As well it aliases memcpy to memmove.

Aside from this function all other SSSE3 functions should be safe to remove.

The performance is not changed drastically although shows overall improvements without any major regressions or gains.

bench-memcpy geometric_mean(N=50) New / Original: 0.957

bench-memcpy-random geometric_mean(N=50) New / Original: 0.912

bench-memcpy-large geometric_mean(N=50) New / Original: 0.892

Benchmarks where run on Zhaoxin KX-6840@2000MHz

This phasing out of SSSE3 code paths where relevant is happening for the Glibc 2.36 release.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week