Benchmarking The SLP Vectorizer On LLVM Clang 3.4
The SLP Vectorizer can vectorize memory access, arithmetic operations, comparison operations, and other select operations. Back when it was ready in LLVM Clang 3.3 I did some early benchmarks and explained it in more detail. There's also the LLVM auto-vectorizer documentation.
With LLVM Clang 3.4 SVN it looks like the superword-level parallelism vectorizer will at least be enabled for the -O3 optimization level if not for other optimization levels too. With this upcoming change, from the LLVM/Clang Subversion code as of this weekend I ran some benchmarks when comparing the -fslp-vectorize compiler switch for a range of C/C++ benchmarks. The -O3 -march=native compiler switches were set the entire time.
These test results can be found on OpenBenchmarking.org in 1307291-SO-FSLPVECTO83.
For most of our real-world workload tests on Linux with LLVM/Clang 3.4 SVN, there was little change in performance out of the basic SLP Vectorizer. However, as the benchmarks showed this past weekend, for certain operations and micro-benchmarks there are worthwhile improvements to find with this straight-code vectorizer. There's at least no regressions even though this isn't quite as useful as the Loop Vectorizer.