AVX / AVX2 / AVX-512 Performance + Power On Intel Rocket Lake
Here is a look at the AVX / AVX2 / AVX-512 performance on the Intel Core i9 11900K "Rocket Lake" when building a set of relevant open-source benchmarks limited to AVX, AVX2, and AVX-512 caps each time while also monitoring the CPU package power consumption during the tests for looking at the performance-per-Watt in providing some fresh reference metrics over AVX-512 on Linux with the latest Intel "Rocket Lake" processors.
Rocket Lake is interesting for being the first desktop CPU (non-HEDT) to feature AVX-512. While it does drive up power consumption and in some cases can be detrimental to the performance due to the clock speed differences when engaging AVX-512, for some workloads it does pay off if caring just about the raw performance.
Today's article is looking at that AVX-512 performance and then limiting the built benchmarks to AVX2 and AVX levels too. A set of C/C++ open-source benchmarks with exposure to using AVX-512 were tested. The following configurations were used when building the benchmarks under test:
No AVX: Setting the CFLAGS/CXXFLAGS to "-O3 -march=native -mno-avx" for disabling any Advanced Vector Extensions from being used in the generated binaries.
AVX: Building the tests with "-O3 -march=native -mno-avx2" for disabling AVX2 (and in turn AVX-512).
AVX2: Building the tests with "-O3 -march=native -mno-avx512f" for disabling AVX-512 usage with "-mno-avx512f" foundations disabling all AVX-512 usage for the generated programs.
AVX-512: Building the tests with "-O3 -march=native -mprefer-vector-width=512" for targeting the capabilities of the processor. The -mprefer-vector-width is used since the latest GCC and Clang compilers try to avoid AVX-512 instructions due to the frequency/performance drop from the wider vectors. With the default -mprefer-vector-width=256 behavior prevents that unless there are 512-bit operations in the original source.
Namely these benchmarks were carried out for reference purposes. The Core i9 11900K was running at stock speeds and on an Ubuntu 21.04 snapshot with the GCC 10.2 compiler. Via the Phoronix Test Suite the tests were rebuilt each time and also monitoring the CPU package power consumption via Intel RAPL on a per-test basis with the "MONITOR=cpu.temp,cpu.power PERFORMANCE_PER_SENSOR=cpu.power" environment variables set to enable that automated sensor collection during testing.