GCC Eyeing -O2 Vectorization For Boosting Intel Core / AMD Zen Performance
Longtime GNU Compiler Collection (GCC) developer Jan Hubicka of SUSE is looking at enabling vectorization as part of the -O2 optimization level for Intel Core, AMD Zen, and generic x86_64 CPU targets.
Particularly for recent Intel Core and AMD Zen processors, when using "-ftree-vectorize -ftree-slp-vectorize" paired with the common -O2 optimization level there are 5~9% performance boosts for some benchmarks. Granted, in some tests the gains are less and a few known performance regressions at this point.
Hubicka commented, "I am surprised how many improvements vectorization at -O2 can do - clearly more parallel CPUs depends it depends on it. In my experience from analyzing regressions of gcc -O2 compared to clang -O2 builds, vectorization is one of most common reasons. Having gcc -O2 producing lower SPEC scores and comparably large binaries to clang -O2 does not feel OK and I think the problem is not limited just to artificial benchmarks."
More details in this mailing list post.
Hubicka is interested in even enabling this vectorization at -O2 for the upcoming GCC 9 release. However, as it's late in the cycle, fellow GCC developer Richard Biener commented he is against that considering the timing and impact of this change. So this may have to wait until GCC 10 but at least GCC's -O2 optimization level will end up becoming more aggressive in the near future.
Particularly for recent Intel Core and AMD Zen processors, when using "-ftree-vectorize -ftree-slp-vectorize" paired with the common -O2 optimization level there are 5~9% performance boosts for some benchmarks. Granted, in some tests the gains are less and a few known performance regressions at this point.
Hubicka commented, "I am surprised how many improvements vectorization at -O2 can do - clearly more parallel CPUs depends it depends on it. In my experience from analyzing regressions of gcc -O2 compared to clang -O2 builds, vectorization is one of most common reasons. Having gcc -O2 producing lower SPEC scores and comparably large binaries to clang -O2 does not feel OK and I think the problem is not limited just to artificial benchmarks."
More details in this mailing list post.
Hubicka is interested in even enabling this vectorization at -O2 for the upcoming GCC 9 release. However, as it's late in the cycle, fellow GCC developer Richard Biener commented he is against that considering the timing and impact of this change. So this may have to wait until GCC 10 but at least GCC's -O2 optimization level will end up becoming more aggressive in the near future.
25 Comments