Optimized Compiler Builds Are Well Worth It For Intel Tiger Lake
Making use of "-march=tigerlake" for building optimized binaries catering to Intel's latest-generation processors is well worth it on the likes of GCC 11. Out of the new instruction set extensions on Tiger Lake is more uplift than we have seen out of recent Intel generations and comparing the different "-march=" targets shows significant performance benefits if you don't mind compiling your own software from source.
Making use of "-march=tigerlake" with the GCC/Clang compilers over icelake-client enables PCONFIG, WBNOINVD, MOVDIRI, MOVDIR64B and AVX512VP2INTERSECT instructions. (As noted by the GCC documentation, the complete list of capabilities for Tiger Lake include MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ, AVX512CD, AVX512VBMI, AVX512IFMA, SHA, CLWB, UMIP, RDPID, GFNI, AVX512VBMI2, AVX512VPOPCNTDQ, AVX512BITALG, AVX512VNNI, VPCLMULQDQ, VAES, PCONFIG, WBNOINVD, MOVDIRI, MOVDIR64B and AVX512VP2INTERSECT.)
For this testing with the Dell XPS 9310 Intel EVO laptop was used with the Core i7 1165G7 processor. While the Tiger Lake notebook was running Ubuntu 20.10 with the Linux 5.10 Git kernel, the GCC 11.0 compiler was used for benchmarking as of its 25 October development state. Off GCC 11.0, the benchmarks under test were built with the CFLAGS/CXXFLAGS of "-O3 -march=XXX". The different -march= values tested included generic x86-64, sandybridge, haswell, skylake, icelake-client, and tigerlake for looking at the impact of increasing the instructions exposed and other optimizations / scheduling model in catering to the newer CPU generations.
There were 64 different tests run for this Tiger Lake compiler testing on the Core i7 1165G7 with the Phoronix Test Suite. Above is the geometric mean of those 64 test results... Going from the Icelake client to Tiger Lake level yielded an additional 4% in the performance overall in these open-source C/C++ system benchmarks. That was the greatest improvement from one -march= level to the next of the various configurations tried. Or going from generic -march=x86-64 to -march=tigerlake (and -O3 throughout) lifted the performance in these particular tests by over 9%. Now let's look at some of the individual results.