Announcement

Collapse
No announcement yet.

Intel Demonstrates Up To 48% Improvement For AVX-512 Optimized PostgreSQL

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Demonstrates Up To 48% Improvement For AVX-512 Optimized PostgreSQL

    Phoronix: Intel Demonstrates Up To 48% Improvement For AVX-512 Optimized PostgreSQL

    With the upcoming PostgreSQL 17 database server release there is some initial AVX-512 optimizations that are looking quite nice according to Intel's findings...


  • #2
    Amazing. More of this optimization!

    Comment


    • #3
      Sounds like another reason to be glad I bought a Zen 4 CPU this January. (Seriously, Intel. You act like you're proud of how dysfunctional you are.)

      Comment


      • #4
        To this day (although it go suspiciously quiet when AMD adopted AVX512) you'll hear people scream that AVX-512 and other vector extensions are useless because GPUs exist.

        Well the answer to that is: no. There has been major work in applying AVX512 to everything from XML parsing to numeric sorting and now to DB indexing, and none of these tasks are magically better on a GPU. It is a vital part of improving performance and power efficiency, and it's nice to see more software support coming online.

        BTW, this includes Torvalds himself, who is great at managing the kernel but is not a deep hardware guy.

        Comment


        • #5
          Mmmh, looking the y-axis of the graph (mib/us) and at the original article, which states this is the bit_count() only throughput, it looks like it is a microoptimization that will lead to minimal-to-null time reduction in general queries.

          In practice, this is nothing to be really excited about, but surely serves as a nice showcase for the bold claim where the optimization "can greatly enhance the performance of PostgreSQL on Intel platforms". Meh

          Comment


          • #6
            Originally posted by chuckula View Post
            To this day (although it go suspiciously quiet when AMD adopted AVX512) you'll hear people scream that AVX-512 and other vector extensions are useless because GPUs exist.

            Well the answer to that is: no. There has been major work in applying AVX512 to everything from XML parsing to numeric sorting and now to DB indexing, and none of these tasks are magically better on a GPU. It is a vital part of improving performance and power efficiency, and it's nice to see more software support coming online.

            BTW, this includes Torvalds himself, who is great at managing the kernel but is not a deep hardware guy.
            Linus Torvalds was criticizing against AVX-512 because they take a lot of silicon space for very niche use cases. We saw the Intel implementations were particularly "unoptimized" in the past: when in use, AVX-512 delivered and uncontrolled power usage so high the implementation required to slow down the processor frequency to keep things under control.
            Intel did quite of a mess with AVX-512, either with implementation and with product segments availability.
            AMD, on the other side, did much much better either with the implementation and product segmentation for both Zen4 and Zen5.

            Comment


            • #7
              I wonder if AVX instructions with Intel end up like ECC, where Intel splits features up by laptop/desktop vs server. I can think of good reasons this may be done, but I hope it's not just done simply to justify server CPU markups or something.
              And IIRC, AVX is somewhat common in desktop use-cases, it is not? Encryption, Compression, some apps, emulation?

              Comment


              • #8
                Great results. Interesting that they didn't include a comparison to AMD, but that's pretty obvious.

                Comment


                • #9
                  I wonder what would happen if they tried this with SSE4 and the POPCNT extension. I assume AVX-512 also suffers from the performance penalty when switching between AVX/AVX2 code and SSE code. Unless you either guarantee that SSE isn't used (seems unlikely) or the entire binary plus frequently used dependencies are AVX enabled (again seems unlikely), there's bound to be some performance hits in here. Last time I played around with SSE and AVX intrinsics, my AVX2 code path had worse performance than my SSE3 code path unless I specifically compiled the entire binary with AVX2 enabled.

                  Comment


                  • #10
                    Originally posted by chuckula View Post
                    To this day (although it go suspiciously quiet when AMD adopted AVX512) you'll hear people scream that AVX-512 and other vector extensions are useless because GPUs exist.
                    Eh, as somebody else mentioned AVX512 takes up a lot of space and power for the performance it gives. In addition, many of the tasks would be better offloaded to the GPU in a proper setup, but that "proper setup" would require large projects like Postgres to add a lot of stuff to their code/build system. So in theory, those people are correct. In reality, most developers of mid to large size codebases aren't going to want to put in the effort to add proper GPGPU support to their code, so AVX512 is more likely to get implemented.

                    AVX instructions are designed to speed up exactly the same kind of data math that GPUs are designed to do fast. The first few extensions were good when you just needed a few vectors worked on, but at what point are you just adding a 2nd (maybe 3rd) GPU to the system, but one that is useless in most cases?

                    Comment

                    Working...
                    X