Qualcomm Begins Optimizing Glibc For Their Oryon CPU Core
Qualcomm has begun landing performance optimizations into the GNU C Library "glibc" for benefiting their new Oryon-1 CPU cores as found in the Snapdragon X Elite/Plus SoCs.
Now that the first of the Snapdragon X Elite ARM64 laptops are shipping and basics of the Linux support surfacing, Qualcomm engineers have begun more work on performance tuning and other improvements across the open-source landscape.
Being merged to the GNU C Library this weekend were the first bits of Oryon-1 CPU tuning. There's now a new memcpy implementation for memory copies with this all important library. That commit noted:
And also a new memset implementation was merged to Glibc as well for the Oryon-1 CPU cores:
These changes will be found in Glibc 2.40 that should be out next month.
Now that the first of the Snapdragon X Elite ARM64 laptops are shipping and basics of the Linux support surfacing, Qualcomm engineers have begun more work on performance tuning and other improvements across the open-source landscape.
Being merged to the GNU C Library this weekend were the first bits of Oryon-1 CPU tuning. There's now a new memcpy implementation for memory copies with this all important library. That commit noted:
"Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches.
For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there."
And also a new memset implementation was merged to Glibc as well for the Oryon-1 CPU cores:
"Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches.
For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there.
I started with the emag memset and added back the `dc zva` code."
These changes will be found in Glibc 2.40 that should be out next month.
6 Comments