Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 18+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts
Toward the end of last year Intel quietly made available x86-simd-sort via their GitHub account. It's a C++ header file library for high performance SIMD sorting though in its current form is just focused on an AVX-512 quicksort implementation.
There hasn't been much coverage of this x86-simd-sort project and the GitHub page itself doesn't do much to talk up the crazy fast performance potential of AVX-512 for sorting... But now by way of the widely-used Numpy open-source project, there is prominent use of it and achieving some staggering results.
Merged today into Numpy was PR 22315 to vectorize the quicksort for 16-bit and 64-bit data types using AVX-512. On an Intel Tigerlake system this sped-up 16-bit int sorting by 17x while float 64-bit sorting by nearly 10x for random arrays and 32-bit data types were 12~13x faster sorts. This Numpy change was made by Intel engineer Raghuveer Devulapalli and is leveraging the x86-simd-sort code.
A speed-up worth celebrating... From multi-vendor support to more efficient AVX-512 implementations on newer processors to more robust software use, there is a lot to enjoy around AVX-512 these days.
A 10~17x speed-up for sorting with AVX-512 is pretty astonishing, especially when factoring in the better AVX-512 efficiencies with recent generations of Intel CPUs. With the latest Xeon Scalable processors the thermal and power impact of AVX-512 is no longer too great or causing significant CPU down-clocking as it was panned for in the past, but is in rather good shape. See my recent Intel Xeon "Sapphire Rapids" AVX-512 benchmarks that includes the power efficiency It's too bad though that the latest Intel Core client processors no longer are offering AVX-512. Meanwhile over on the AMD side with their Zen 4 processors from the Ryzen 7000 series through the 4th Gen EPYC server processors is (finally) AVX-512 support.
It will be interesting to see what other software projects decide to make use of this x86-simd-sort for speedy AVX-512 sorting. It's another notable win for Advanced Vector Extensions 512 similar to how last year simdjson tapped AVX-512 for very fast JSON parsing as something one would normally not think of immediately as a great use-case for AVX-512.