GCC 4.6 Compiler Performance With AVX On Sandy Bridge

Written by Michael Larabel in Processors on 7 February 2011 at 09:40 AM EST. Page 1 of 6. 16 Comments.

While we are still battling issues with the Intel Linux graphics driver in getting that running properly with Intel's new Sandy Bridge CPUs (at least Intel's Jesse Barnes is now able to reproduce the most serious problem we've been facing, but we'll save the new graphics information for another article), the CPU performance continues to be very compelling. Two weeks ago we published the Intel Core i5 2500K Linux benchmarks that showed just how well this quad-core CPU that costs a little more than $200 USD is able to truly outperform previous generations of Intel hardware. That was just with running the standard open-source benchmarks and other Linux software, which has not been optimized for Intel's latest micro-architecture. Version 4.6 of the GNU Compiler Collection (GCC) though is gearing up for release and it will bring support for the AVX extensions. In this article, we are benchmarking GCC 4.6 on a Sandy Bridge system to see what benefits there are to enabling the Core i7 AVX optimizations.

The Advanced Vector Extensions, AVX, is the newest instruction set architecture that was jointly agreed upon by Intel and AMD as the succeeding technology to SSE4. Key points of the AVX ISA is expecting the vector data width to 256-bits, a new SIMD instruction format, and new data manipulation and arithmetic compute primitives. Simply put, AVX is meant to be another step forward for increasing the processor's performance and efficiency. AVX has been talked about for years but with Intel's Sandy Bridge CPUs the Advanced Vector Extensions support is finally in place. AMD will launch their first AVX-capable CPUs later in the calendar year.

Fortunately, going back to at least early 2009, Intel has been working on AVX support. In April of 2009 the main bits of Intel AVX support landed into the mainline Linux 2.6.30 support. This kernel-level AVX work was for enabling YMM state management for the 256-bit vector processing. In order to run an Intel Core i5/i7 Sandy Bridge CPU under Linux with one of the new Intel chipsets you need to be using a H2'2010 Linux distribution (circa mainline Linux 2.6.35), so regardless there is AVX support in place at the kernel level for you if running Linux. From this regard, the AVX support is actually in a better position than on Windows, which requires using Microsoft Windows 7 with the brand new Service Pack 1.

When it comes to the compiler support for AVX, the GNU Compiler Collection developers have been working on that for some time as well. There are traces of Advanced Vectors Extension support in this leading open-source compiler going back to GCC 4.4. However, it was not until this past December in the run-up to GCC 4.6 that there were mtune/march/with-cpu options available designed for AVX and Intel's newest CPUs. In early December an Intel engineer submitted the patches for Core i7 AVX CPUs. The option is named "corei7-avx" and is designed for use with Core i7 CPUs that carry the AVX support. The GCC documentation describes the corei7-avx option as for "Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support." As mentioned recently, this will all appear in GCC 4.6.0 when released in the coming months. GCC 4.6 also can be built with the --with-fpmath=avx flag, which will allow the GNU compiler to use AVX floating-point arithmetic.

On the LLVM side, the Low-Level Virtual Machine only appears to have partial support for the Advanced Vector Extensions and lacks Core i7 "Sandy Bridge" tuning, but the LLVM/Clang benchmarks under Sandy Bridge will be saved for another article. In this article, we are looking at the AVX performance under GCC on Intel's newest CPUs. To do this comparison we first built GCC 4.3.5, GCC 4.4.5, GCC 4.5.2 and GCC 4.6.0 from source without specifying any tuning options during the build process or when building out our library of benchmarks. This was then followed by building out our test library (after self-hosting the same version of GCC with the same test options) with the core2, corei7, and corei7-avx options. Lastly, we tested GCC 4.6.0 when built with the AVX floating-point math support and using the corei7-avx flags. The GCC 4.6.0 build we were using was the 2010-01-29 snapshot. We did the GCC 4.3/4.4 testing to see how this open-source compiler would react when running on this more-modern CPU. The other argument used when building the GCC releases were --enable-lto for enabling the link-time optimization but besides that it was a stock build.

The core2 GCC option is designed for CPUs with just MMX, SSE, SSE2, and SSE3 instruction set support. The corei7 vanilla option adds in SSE4.1 and SSE4.2 AVX support to the mix while the corei7-avx option obviously adds in the AVX instruction set support plus AES and PCLMUL ISA support.

Via Phoronix Test Suite 3.0 "Iveland" and OpenBenchmarking.org the arsenal of Linux compiler tests this time around included PostgreSQL, Apache web-server, Apache build process, John The Ripper, C-Ray, POV-Ray, Himeno, MAFFT, MrBayes, HMMer, FLAC, GraphicsMagick, and Gcrypt.

It is not the Core i5 2500K setup we are using this time around but rather a Sandy Bridge notebook we just received from System76. It's a very nice, but expensive (circa $2500 USD) System76 Serval Professional Notebook that has an Intel Core i7 2720QM CPU, 8GB of system memory, an 80GB Intel SSDSA2M080 solid-state drive, and NVIDIA GeForce GTX 485M 2GB graphics. It ships with the Ubuntu 10.10 x86_64 release and besides testing out the different compilers, we also upgraded its kernel against the mainline Linux 2.6.38-rc2 release. The Intel Core i7 2720QM is a quad-core part with Hyper Threading that is clocked at 2.2GHz but with a Turbo Boost Frequency of 3.3GHz and there is 6MB of L3 cache.

Related Articles