Intel's x86-simd-sort 4.0 Delivers A 2x Boost For AVX-512 Performance, Adds AVX2 Code
Earlier this year Intel published x86-simd-sort as a very speedy sorting library that initially leveraged AVX-512 instructions for 10x to 17x faster sorts. Numpy was one of the first major projects to adopt x86-simd-sort and OpenJDK more recently adopted it. Since the initial release we've seen more features and performance optimizations added. Today marks the release of x86-simd-sort 4.0 and it's delivering even greater performance while also adding an AVX2 code path to help those without AVX-512.
With x86-simd-sort 4.0 they have managed to achieve a 2x speed-up for sorting 32-bit data. Not bad for already blazing fast sort speeds... 64-bit data meanwhile will see around a 1.5x speed-up while 16-bit data will see around a 1.25x speed-up.
Besides making x86-simd-sort even faster, the v4.0 release is notable for now introducing AVX2 code paths for 32-bit and 64-bit data types. With the AVX2 optimized code paths, Intel found their implementation to be 12x faster for 32-bit data than std::sort and around 7x faster for sorting 64-bit data. This is important with the latest Intel Core CPUs lacking AVX-512 so now at least they too can use x86-simd-sort with AVX2.
With this new release, x86-simd-sort 4.0 can also now be built as a shared library that also has run-time dispatching support for automatically picking the fastest version among AVX-512 / AVX2 / scalar depending upon the processor.
Downloads and more details on the x86-simd-sort 4.0 release via GitHub.
With x86-simd-sort 4.0 they have managed to achieve a 2x speed-up for sorting 32-bit data. Not bad for already blazing fast sort speeds... 64-bit data meanwhile will see around a 1.5x speed-up while 16-bit data will see around a 1.25x speed-up.
Besides making x86-simd-sort even faster, the v4.0 release is notable for now introducing AVX2 code paths for 32-bit and 64-bit data types. With the AVX2 optimized code paths, Intel found their implementation to be 12x faster for 32-bit data than std::sort and around 7x faster for sorting 64-bit data. This is important with the latest Intel Core CPUs lacking AVX-512 so now at least they too can use x86-simd-sort with AVX2.
With this new release, x86-simd-sort 4.0 can also now be built as a shared library that also has run-time dispatching support for automatically picking the fastest version among AVX-512 / AVX2 / scalar depending upon the processor.
Downloads and more details on the x86-simd-sort 4.0 release via GitHub.
6 Comments