Show Your Support: Have you heard of Phoronix Premium? It's what complements advertisements on this site for our premium ad-free service. For less than $4 USD per month, you can help support our site while the funds generated allow us to keep doing Linux hardware reviews, performance benchmarking, maintain our community forums, and much more.
LLVM 3.3 To Introduce SLP Vectorizer
One of the prominent features to be introduced with the LLVM 3.3 release this summer is the SLP Vectorizer. Introduced in the LLVM 3.2 release was the LLVM Loop Vectorizer for vectorizing loops while the new SLP Vectorizer is about optimizing straight-line code by merging multiple scalars into vectors.
The LLVM Loop Vectorizer has been benchmarked already on Phoronix while the SLP Vectorizer is new to LLVM 3.3. The SLP "Superword-Level Parallelism" Vectorizer works by combining similar independent instructions into vector instructions. This LLVM vectorizer is then able to vectorizer memory accesses, arithmetic operations, comparison operations, and some math functions.
With LLVM 3.3 the Loop Vectorizer is now enabled by default with the "-O3" optimization level while the SLP Vectorizer is still experimental and must be manually enabled. Enabling the SLP Vectorizer can be done by the "-fslp-vectorize" compiler switch. There's also a second basic-block vectorization phase (using the LLVM BB Vectorizer) that can be applied using the "-fslp-vectorize-aggressive" switch. More details on the available LLVM vectorizers can be found via the LLVM documentation.
While the SLP Vectorizer in LLVM is still in early stages of development, LLVM developers have reported that it can already be used to accelerate many of their test programs. Curious about the performance, I carried out some new C/C++ benchmarks with LLVM/Clang 3.3 SVN on a few of our well-known test programs through the Phoronix Test Suite.
The LLVM, Compiler-RT, and Clang source-code was obtained via SVN on the morning of 2 May 2013. Benchmarking happened from a Lenovo ThinkPad laptop with an Intel Core i7 720QM processor and was running Ubuntu 13.04 x86_64 with the Linux 3.8 kernel. Benchmark results are from when just setting "-O3 -march=native" as the CFLAGS/CXXFLAGS and then for additional runs when appending to the compiler flags "-fslp-vectorize" and "-fslp-vectorize-aggressive" on top of the earlier optimizations.