GCC 11 Lands A Last Minute Optimization For Intel Skylake
While Skylake was introduced a half-decade ago already, Intel's open-source engineers aren't done relentlessly optimizing for it and subsequent 14nm processors. Hitting the GCC 11 open-source compiler today was an optimization for benefiting Skylake through the likes of Cascade Lake with some possible performance benefits.
Intel open-source compiler expert H.J. Lu merged a patch he posted a few days prior to the GCC mailing list. The patch that landed this morning is updating the memcpy and memset inline strategies for Skylake family CPUs.
This updating of the memory copy and memory set inline strategies is to try to avoid branches for Skylake era processors.
H.J. Lu's testing on a Cacade Lake processor yielded sub-1% differences for the likes of SPEC CPU 2017, but with the EEMBC benchmark test cases he found this tuning to be more significant. In the EEMBC benchmarks there were "significant impacts" with at least two tests having a 9~29% difference.
So while this patch alone is not likely to cause any broad performance wins, it was at least worthwhile for H.J. Lu to pursue these years later and it's one of many improvements that have accumulated over the past year for the imminent GCC 11 release -- both general improvements and optimizations and many Intel/AMD/ARM/POWER family optimizations too. There will be many GCC 10 vs. 11 compiler benchmarks coming up on Phoronix in the weeks ahead with the GCC 11.1 stable release right around the corner.
Intel open-source compiler expert H.J. Lu merged a patch he posted a few days prior to the GCC mailing list. The patch that landed this morning is updating the memcpy and memset inline strategies for Skylake family CPUs.
This updating of the memory copy and memory set inline strategies is to try to avoid branches for Skylake era processors.
H.J. Lu's testing on a Cacade Lake processor yielded sub-1% differences for the likes of SPEC CPU 2017, but with the EEMBC benchmark test cases he found this tuning to be more significant. In the EEMBC benchmarks there were "significant impacts" with at least two tests having a 9~29% difference.
So while this patch alone is not likely to cause any broad performance wins, it was at least worthwhile for H.J. Lu to pursue these years later and it's one of many improvements that have accumulated over the past year for the imminent GCC 11 release -- both general improvements and optimizations and many Intel/AMD/ARM/POWER family optimizations too. There will be many GCC 10 vs. 11 compiler benchmarks coming up on Phoronix in the weeks ahead with the GCC 11.1 stable release right around the corner.
Add A Comment