Intel Contributes AVX-512 Optimizations To Numpy, Yields Massive Speedups

Written by Michael Larabel in Intel on 12 October 2021 at 02:50 PM EDT. 31 Comments
Intel has contributed AVX-512 optimizations to upstream Numpy. For those using Numpy as this leading Python library for numerical computing, newer Intel CPUs with AVX-512 capabilities can enjoy major speed-ups in the range of 14~32x faster.

This summer Intel volleyed their initial AVX-512 code for Numpy and finally this week the code was merged upstream. This open-source AVX-512 code originates from the Intel Short Vector Math Library (SVML) that they open-sourced the code from. Intel has also been working on allowing Numpy to be built against SVML as a separate improvement.

The initial AVX-512 implementation provides optimized versions of 44 math functions -- pretty much all the major math functions, and in both single and double precision modes (Update: it looks like the merged version of the work has AVX-512 optimized versions for 18 math functions, down from the original 44, looking into the difference or other outstanding merge requests).

Intel engineers found that even with older Intel Skylake X processors this meant Numpy was running up to 55x faster in select functions. The average speed-up was 14x for double precision and 32x for single precision performance.

This exciting addition to Numpy can be found via this commit ahead of its next release. I'll have up some fresh Numpy benchmarks of our own soon on Phoronix.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week