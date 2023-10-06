Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 19+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
OpenJDK Merges Intel's x86-simd-sort For Speeding Up Data Sorting 7~15x
x86-simd-sort 3.0 adds a new "avx512_argselect" method to compute the arg nth_element that returns an array of indices that would partition the data array. The x86-simd-sort 3.0 release also has improvements to its benchmarks, now uses __builtin_cpu_supports rather than querying cpuinfo, and various other changes.
With x86-simd-sort 3.0 in Numpy, they are seeing the "ng.partition" speed-ups by up to 25x for 16-bit, 17x for 32-bit data types, and 8x for 64-bit data types. The numpy np.argpartition is up to 6.5x faster with the new avx512_argselect method.
Meanwhile merged this afternoon is a slightly modified version of x86-simd-sort within OpenJDK. With this sorting code merged, 32-bit data sorting is up to 15x faster and around 7x faster for 64-bit data.
More details on x86-simd-sort 3.0 for speedy AVX-512 sorting via GitHub.