OpenJDK Merges Intel's x86-simd-sort For Speeding Up Data Sorting 7~15x
Earlier this year Intel posted x86-simd-sort as a blazing fast sorting library that makes use of AVX-512. When the popular Numpy began using it they found up to 10~17x faster sorts for 16-bit to 64-bit data types. Today Intel software engineers released x86-simd-sort 3.0 and it also comes minutes after OpenJDK merged a modified version of this speeding sorting code into that reference JDK codebase.
x86-simd-sort 3.0 adds a new "avx512_argselect" method to compute the arg nth_element that returns an array of indices that would partition the data array. The x86-simd-sort 3.0 release also has improvements to its benchmarks, now uses __builtin_cpu_supports rather than querying cpuinfo, and various other changes.
With x86-simd-sort 3.0 in Numpy, they are seeing the "ng.partition" speed-ups by up to 25x for 16-bit, 17x for 32-bit data types, and 8x for 64-bit data types. The numpy np.argpartition is up to 6.5x faster with the new avx512_argselect method.
Meanwhile merged this afternoon is a slightly modified version of x86-simd-sort within OpenJDK. With this sorting code merged, 32-bit data sorting is up to 15x faster and around 7x faster for 64-bit data.
More details on x86-simd-sort 3.0 for speedy AVX-512 sorting via GitHub.
x86-simd-sort 3.0 adds a new "avx512_argselect" method to compute the arg nth_element that returns an array of indices that would partition the data array. The x86-simd-sort 3.0 release also has improvements to its benchmarks, now uses __builtin_cpu_supports rather than querying cpuinfo, and various other changes.
With x86-simd-sort 3.0 in Numpy, they are seeing the "ng.partition" speed-ups by up to 25x for 16-bit, 17x for 32-bit data types, and 8x for 64-bit data types. The numpy np.argpartition is up to 6.5x faster with the new avx512_argselect method.
Meanwhile merged this afternoon is a slightly modified version of x86-simd-sort within OpenJDK. With this sorting code merged, 32-bit data sorting is up to 15x faster and around 7x faster for 64-bit data.
More details on x86-simd-sort 3.0 for speedy AVX-512 sorting via GitHub.
38 Comments