Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

Written by Michael Larabel in Intel on 15 February 2023 at 04:00 PM EST. 51 Comments
INTEL
Intel recently published an open-source C++ header file library for high performance SIMD-based sorting, which initially is focused on providing a lightning fast AVX-512 quicksort implementation. As of today that code has been merged to Numpy and is providing some 10~17x speed-ups.

Toward the end of last year Intel quietly made available x86-simd-sort via their GitHub account. It's a C++ header file library for high performance SIMD sorting though in its current form is just focused on an AVX-512 quicksort implementation.

There hasn't been much coverage of this x86-simd-sort project and the GitHub page itself doesn't do much to talk up the crazy fast performance potential of AVX-512 for sorting... But now by way of the widely-used Numpy open-source project, there is prominent use of it and achieving some staggering results.

Merged today into Numpy was PR 22315 to vectorize the quicksort for 16-bit and 64-bit data types using AVX-512. On an Intel Tigerlake system this sped-up 16-bit int sorting by 17x while float 64-bit sorting by nearly 10x for random arrays and 32-bit data types were 12~13x faster sorts. This Numpy change was made by Intel engineer Raghuveer Devulapalli and is leveraging the x86-simd-sort code.

Intel and AMD AVX-512 enabled processors
A speed-up worth celebrating... From multi-vendor support to more efficient AVX-512 implementations on newer processors to more robust software use, there is a lot to enjoy around AVX-512 these days.


A 10~17x speed-up for sorting with AVX-512 is pretty astonishing, especially when factoring in the better AVX-512 efficiencies with recent generations of Intel CPUs. With the latest Xeon Scalable processors the thermal and power impact of AVX-512 is no longer too great or causing significant CPU down-clocking as it was panned for in the past, but is in rather good shape. See my recent Intel Xeon "Sapphire Rapids" AVX-512 benchmarks that includes the power efficiency It's too bad though that the latest Intel Core client processors no longer are offering AVX-512. Meanwhile over on the AMD side with their Zen 4 processors from the Ryzen 7000 series through the 4th Gen EPYC server processors is (finally) AVX-512 support.

It will be interesting to see what other software projects decide to make use of this x86-simd-sort for speedy AVX-512 sorting. It's another notable win for Advanced Vector Extensions 512 similar to how last year simdjson tapped AVX-512 for very fast JSON parsing as something one would normally not think of immediately as a great use-case for AVX-512.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week