Announcement

Collapse
No announcement yet.

Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

    Phoronix: Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

    Intel recently published an open-source C++ header file library for high performance SIMD-based sorting, which initially is focused on providing a lightning fast AVX-512 quicksort implementation. As of today that code has been merged to Numpy and is providing some 10~17x speed-ups...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Nice, someone please post a AVX512 zen4 vs raptor lake for AVX512 sorting.

    Comment


    • #3
      Quick, post test results with AMD CPUs, to see if is business as usual with Intel.

      Comment


      • #4
        You would think we had reached maximum sorting speed by now, but I had the pleasure to read about two new sorting algorithms just this week:
        A performance analysis of glidesort and ipn_stable

        Comment


        • #5
          Makes one wonder how many other low-hanging fruit there are that provide order-of-magnitude speedups. The discussion of the kernel patch for parallelizing bootup is another example, and I have to believe there are others as well. Exciting.

          Comment


          • #6
            Originally posted by igxqrrl View Post
            Makes one wonder how many other low-hanging fruit there are that provide order-of-magnitude speedups. The discussion of the kernel patch for parallelizing bootup is another example, and I have to believe there are others as well. Exciting.
            The general process for that is that you collect performance data (likely via profiling, but things like runtime and throughput numbers can work too), convince yourself that there is a way for software to produce better performance data from the hardware and then implement it.

            Comment


            • #7
              Originally posted by BillBroadley View Post
              Nice, someone please post a AVX512 zen4 vs raptor lake for AVX512 sorting.
              add the muggle muck m2 cpu to the dance battle

              Comment


              • #8
                That's good news. Do these new SPR-WS chips have dual avx512 units per core?

                Comment


                • #9
                  Now I'm curious how well-optimized (or not) numpy's old sorting code was. I'll bet you could optimize it by at least a couple times, before involving vector instructions.

                  It'd be more interesting if x86-simd-sort, itself, claimed to be an order of magnitude faster than the fastest known non-vectorized implementation.
                  Last edited by coder; 15 February 2023, 10:06 PM.

                  Comment


                  • #10
                    Originally posted by jayN View Post
                    That's good news. Do these new SPR-WS chips have dual avx512 units per core?
                    Uh, you mean dual-FMA? I thought Intel's policy since Ice Lake SP was to have dual-FMA per core in all SKUs. In that case, I'd expect all Sapphire Rapid models (both Scalable & Xeon W) to have it.
                    Last edited by coder; 16 February 2023, 07:37 AM.

                    Comment

                    Working...
                    X