Intel Contributes AVX-512 Optimizations To Numpy, Yields Massive Speedups
Intel has contributed AVX-512 optimizations to upstream Numpy. For those using Numpy as this leading Python library for numerical computing, newer Intel CPUs with AVX-512 capabilities can enjoy major speed-ups in the range of 14~32x faster.
This summer Intel volleyed their initial AVX-512 code for Numpy and finally this week the code was merged upstream. This open-source AVX-512 code originates from the Intel Short Vector Math Library (SVML) that they open-sourced the code from. Intel has also been working on allowing Numpy to be built against SVML as a separate improvement.
The initial AVX-512 implementation provides optimized versions of 44 math functions -- pretty much all the major math functions, and in both single and double precision modes (Update: it looks like the merged version of the work has AVX-512 optimized versions for 18 math functions, down from the original 44, looking into the difference or other outstanding merge requests).
Intel engineers found that even with older Intel Skylake X processors this meant Numpy was running up to 55x faster in select functions. The average speed-up was 14x for double precision and 32x for single precision performance.
This exciting addition to Numpy can be found via this commit ahead of its next release. I'll have up some fresh Numpy benchmarks of our own soon on Phoronix.
This summer Intel volleyed their initial AVX-512 code for Numpy and finally this week the code was merged upstream. This open-source AVX-512 code originates from the Intel Short Vector Math Library (SVML) that they open-sourced the code from. Intel has also been working on allowing Numpy to be built against SVML as a separate improvement.
The initial AVX-512 implementation provides optimized versions of 44 math functions -- pretty much all the major math functions, and in both single and double precision modes (Update: it looks like the merged version of the work has AVX-512 optimized versions for 18 math functions, down from the original 44, looking into the difference or other outstanding merge requests).
Intel engineers found that even with older Intel Skylake X processors this meant Numpy was running up to 55x faster in select functions. The average speed-up was 14x for double precision and 32x for single precision performance.
This exciting addition to Numpy can be found via this commit ahead of its next release. I'll have up some fresh Numpy benchmarks of our own soon on Phoronix.
31 Comments