Announcement

Collapse
No announcement yet.

Intel QAT Adapted For Zstd To Provide Big Performance/Efficiency Wins

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel QAT Adapted For Zstd To Provide Big Performance/Efficiency Wins

    Phoronix: Intel QAT Adapted For Zstd To Provide Big Performance/Efficiency Wins

    While Intel has maintained the QATzip open-source compression library for demonstrating data compression using QuickAssist Technology (QAT) with DEFLATE/LZ4/LZ4s, Intel has also been working on QAT'ed Zstd for achieving some sizable victories in performance and power efficiency...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Looking forward to some IO benchmarks on this Michael!

    Comment


    • #3
      I wish Intel would have benchmarked some more compression settings with the QAT in addition to including the default value of 3 to use as a baseline for both the CPU and QAT. Testing just 9 with a QAT and 4 and 5 with a CPU was odd other than the one fixed throughput slide showing 9-QAT being between 4/5-CPU.

      What does 18 or 22 on a QAT compare with on the CPU?

      Only one power to performance ratio data point doesn't show how power usage scales with compression level used. Was that the best result for power/performance? Does it get better or worse with level used? Is this best result cherry picked from an in-depth benchmark?

      Interesting, not very informative.

      Comment


      • #4
        > Intel also shared this slide, among others, but do note the differing in Zstd compression levels with their benchmarks:

        The compression ratios are actually the same, just using different numbers for the levels. QAT ZStd level 9 achieves a compression ratio of 2.76, ZStd 1.5.5 has a ratio of 2.74 resp. 2.77.

        Comment


        • #5
          Originally posted by skeevy420 View Post
          Testing just 9 with a QAT and 4 and 5 with a CPU was odd other than the one fixed throughput slide showing 9-QAT being between 4/5-CPU.
          Its not "just 9", the linked article clearly states:
          The following measurements were taken using a block size of 16KB with Intel QAT HW configured for its best compression ratio.​
          Trying to implement anything more complex (higher compresssion level) would be detrimental to throughput and power efficiency, while using more silicon area. The highest HW compression ratio is already slightly better than the default SW compression ratio (zstd SW defaults to level 3).

          If you need higher compression, you have to use the SW implementation. You also have to live with a significantly higher compression time (orders of magnitudes slower), gaining just a little bit of compression.

          The HW compression is meant for one of compression for e.g. data transmitted over network (hence the importance of latency mentioned), or short to medium time storage on disk.

          Comment


          • #6
            Great, it's nice to see an actually useful innovation.
            Now if only the web people could get their shit together, zstd has been out for 7 years and browsers still do not support zstd compression for HTTP traffic...

            Comment


            • #7
              Originally posted by StefanBruens View Post

              Its not "just 9", the linked article clearly states:

              Trying to implement anything more complex (higher compresssion level) would be detrimental to throughput and power efficiency, while using more silicon area. The highest HW compression ratio is already slightly better than the default SW compression ratio (zstd SW defaults to level 3).

              I know, I'm still curious how QAT would have scaled instead of just getting the best results cherry picked.

              ​If you need higher compression, you have to use the SW implementation. You also have to live with a significantly higher compression time (orders of magnitudes slower), gaining just a little bit of compression.

              The HW compression is meant for one of compression for e.g. data transmitted over network (hence the importance of latency mentioned), or short to medium time storage on disk.
              Honestly, when QATs go for ~$250 and up used, I just expected better than SW at 4 or 5 performance...because neither of those are useful for long-term data archival. That only seems like its useful for on-the-fly compression...which means it'd have been interesting to see how it compared to LZ4 and other supported QAT CODECs.

              Also, all things considered, since accelerator cards cost hundreds to thousands of dollars, 90w of savings isn't all that much and can easily be offset and then some with solar and wind technologies...home and business. A $1000 accelerator card to save 90w vs $1000 in solar to gain 500w. Makes ya think.

              Comment


              • #8
                Originally posted by skeevy420 View Post
                What does 18 or 22 on a QAT compare with on the CPU?
                They only implemented level 1 - 12 and those are not comparable to zstd levels ...

                I wonder what QAT is in hardware? Is it some kind of FPGA?

                Comment


                • #9
                  It sucks though that Intel has disabled QAT4 from their workstation sapphire rapids processors (aka WS).

                  I wish Intel would stop with this market segmentation already (See avx512, ecc, vroc/vmd/rst, optane pmem, etc).

                  Comment


                  • #10
                    Originally posted by bezirg View Post
                    It sucks though that Intel has disabled QAT4 from their workstation sapphire rapids processors (aka WS).

                    I wish Intel would stop with this market segmentation already (See avx512, ecc, vroc/vmd/rst, optane pmem, etc).
                    There are several Xeon Scalable SKUs which may meet your requirements, if you don't need the high TurboBoost frequencies. See e.g. https://www.intel.com/content/www/us...,232390,233418

                    AVX segmentation will be significantly reduced when AVX10 becomes available. ECC segmentation is already reduced, many Raptor Lake SKUs (e.g. i5-13600) support ECC already.

                    Comment

                    Working...
                    X