Optimized Binaries Provide Great Benefits For Intel Haswell
Utilizing the core-avx2 CPU optimizations offered by the GCC 4.8 compiler can provide real benefits for the Intel Core i7 4770K processor and other new "Haswell" CPUs. For some computational workloads, the new Haswell instruction set extensions can offer tremendous speed-ups compared to what's offered by the previous-generation Ivy Bridge CPUs.
With our source-based benchmarks to date of Haswell, we have been using the -march=native compiler flag that effectively means -march=core-avx2 but we haven't looked specifically at the benefits provided by Haswell CPUs introducing support for AVX2, FMA, BMI, and BMI2. This is also particularly interesting since on the Windows side, most of the benchmarking that happens at the other review sites is done using generic pre-compiled binaries rather than building from source with optimizations for a given architecture.
For those unfamiliar with the GCC x86/x86_64 optimization options, read the online GCC documentation. The new core-avx2 option means "Intel Core CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support."
The other tested options were Nocona (the old Xeons), Core2 (the original Intel Core CPUs with SSE3 support), Corei7 (Ironlake), Corei7-avx (Sandy Bridge), and Core-avx-i (Ivy Bridge). Again, the GCC documentation explains what instruction sets are offered by each of these different Intel CPU models.
Results in full are on OpenBenchmarking.org in 1306150-PTS-INTELHAS05. The Core i7 4770K Haswell system was running Ubuntu 13.04 with the Linux 3.10 development kernel and GCC 4.8.1 was built from source. The various compiler options were set via the CFLAGS and CXXFLAGS environment variables along with passing -O3 for the most aggressive compiler optimizations.