LLVM's Loop Vectorizer, which is able to automatically vectorize code loops for performance benefits in many scenarios, may find its use expanded for other optimization levels in future LLVM releases.
LLVM's automatic loop vectorizer
was merged for the LLVM 3.2 release and benchmarking the loop vectorizer
showed it to provide performance benefits for many scenarios. In the LLVM 3.2 release it wasn't enabled by default, but for LLVM 3.3 it's now enabled when using the -O3
Besides enabling it default for this greatest optimization level, LLVM 3.3 also provided improvements to the loop vectorizer
. The LLVM loop vectorizer is now in good standing and so it might be enabled too by default for -O2
There's still some differing views on why the vectorizer should be turned on for -O2
, which is the mid optimization level before -O3
, but for at least -Os
it should be turned on. The -Os
level is when optimizing generated binaries for size. The loop vectorizer has the potential of increasing the binary size for some loops, but LLVM is able to weight that information and decide when to vectorize or not.
Apple LLVM developers have now been discussing on the LLVM mailing list
about expanding the loop vectorizer's usage by default. The performance wins provided by this vectorizer seem to be worth it to many people even at the potential cost of a slightly longer compile time or the chance that the resulting binary size is slightly larger. This is a change that won't come for LLVM 3.3 but will be for LLVM 3.4 or later; we'll see what happens and post the decision on Phoronix.
LLVM 3.3 also has the interesting SLP vectorizer
to optimize straight-line code, but this current discussion is only talking about the loop vectorizer.