GCC 8 Hasn't Been Performing As Fast As It Should For Skylake With "-march=native"
It turns out that when using GCC 8 since April (or GCC 9 development code) if running on Intel Skylake (or newer architectures like the yet-to-be-out Cannonlake or Icelake) and compile your code with the "-march=native" flag for what should tune for your CPU microarchitecture's full capabilities, that hasn't entirely been the case. A fix is en route that can correct the performance by as much as 60%.
H.J. Lu of Intel posted a patch today for properly tuning Skylake, Cannonlake. and Icelake targets when using the "-march=native" option. He explained on the just-posted patch:
That revision to GCC was made back in April that caused -march=native to not be exploited to its full potential on Skylake and newer. Thus it is part of the current stable GCC 8.1 release. Though if you don't use the "-march=native" flag, are still on GCC 7, etc, the performance still should be the same. With often using "-march=native" for benchmark comparisons, it will be interesting to re-check GCC 8's performance on Skylake+ once this patch is merged in presumably the very near future.
Update: Fun fact is that this issue turns out it was uncovered using the Phoronix Test Suite.
H.J. Lu of Intel posted a patch today for properly tuning Skylake, Cannonlake. and Icelake targets when using the "-march=native" option. He explained on the just-posted patch:
r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations which are enabled by PROCESSOR_HASWELL. As the result, -mtune=skylake generates slower codes on Skylake than before. The same also applies to Cannonlake and [Icelake] tuning.
This patch changes -mtune={skylake|cannonlake|icelake} to tune like -mtune=haswell for until their tuning is properly adjusted. It also enables -mprefer-vector-width=256 for -mtune=haswell, which has no impact on codegen when AVX512 isn't enabled.
...
This patch improves -march=native performance on Skylake up to 60% and leaves -march=native performance unchanged on Haswell.
That revision to GCC was made back in April that caused -march=native to not be exploited to its full potential on Skylake and newer. Thus it is part of the current stable GCC 8.1 release. Though if you don't use the "-march=native" flag, are still on GCC 7, etc, the performance still should be the same. With often using "-march=native" for benchmark comparisons, it will be interesting to re-check GCC 8's performance on Skylake+ once this patch is merged in presumably the very near future.
Update: Fun fact is that this issue turns out it was uncovered using the Phoronix Test Suite.
14 Comments