With LLVM Clang 3.4 SVN it looks like the superword-level parallelism vectorizer will at least be enabled for the -O3 optimization level if not for other optimization levels too. With this upcoming change, from the LLVM/Clang Subversion code as of this weekend I ran some benchmarks when comparing the -fslp-vectorize compiler switch for a range of C/C++ benchmarks. The -O3 -march=native compiler switches were set the entire time.
For most of our real-world workload tests on Linux with LLVM/Clang 3.4 SVN, there was little change in performance out of the basic SLP Vectorizer. However, as the benchmarks showed this past weekend, for certain operations and micro-benchmarks there are worthwhile improvements to find with this straight-code vectorizer. There's at least no regressions even though this isn't quite as useful as the Loop Vectorizer.