GCC 12 Looking At Enabling Its Vectorizer For "-O2" Optimization Level
The GCC compiler when using the default "-O2" optimization level is likely to be slightly faster with next year's GCC 12 release as the developers are looking at enabling the vectorizer options by default.
GCC currently doesn't enable its loop and SLP vectorizers by default until hitting the "-O3" optimization level but there is talk about flipping on the vectorize options at -O2, the optimization level commonly used by many Linux distributions and other software packages.
Stemming from discussions in August, GCC developers are taking a serious look at enabling their vectorization options at the -O2 level. This work is about enabling loop vectorization on trees.
There still are a few regressions to be sorted out but overall having this enabled at -O2 should be a win for the GNU Compiler Collection's performance.
The discussion has been leaning towards enabling the vectorize option with the "very cheap" cost model (-fvect-cost-model=very-cheap). That very cheap model enables vectorization if the scalar iteration count is a multiple of four, it is the "cheapest" of these cost models. Meanwhile the default cost model for vectorization at -O3 is "dynamic" for having more checks to try to determine if a vectorized code path will be faster. More details for those interested via the GCC documentation.
The discussion over enabling the vectorizer by default for -O2 in next year's GCC 12 release is now under discussion and so far there is interest in seeing this happen from developers working on multiple CPU targets.
GCC currently doesn't enable its loop and SLP vectorizers by default until hitting the "-O3" optimization level but there is talk about flipping on the vectorize options at -O2, the optimization level commonly used by many Linux distributions and other software packages.
Stemming from discussions in August, GCC developers are taking a serious look at enabling their vectorization options at the -O2 level. This work is about enabling loop vectorization on trees.
There still are a few regressions to be sorted out but overall having this enabled at -O2 should be a win for the GNU Compiler Collection's performance.
The discussion has been leaning towards enabling the vectorize option with the "very cheap" cost model (-fvect-cost-model=very-cheap). That very cheap model enables vectorization if the scalar iteration count is a multiple of four, it is the "cheapest" of these cost models. Meanwhile the default cost model for vectorization at -O3 is "dynamic" for having more checks to try to determine if a vectorized code path will be faster. More details for those interested via the GCC documentation.
The discussion over enabling the vectorizer by default for -O2 in next year's GCC 12 release is now under discussion and so far there is interest in seeing this happen from developers working on multiple CPU targets.
15 Comments