Announcement

**chuckula** · 17 August 2023, 08:42 AM

Looking forward to some IO benchmarks on this Michael!

**skeevy420** · 17 August 2023, 08:48 AM

I wish Intel would have benchmarked some more compression settings with the QAT in addition to including the default value of 3 to use as a baseline for both the CPU and QAT. Testing just 9 with a QAT and 4 and 5 with a CPU was odd other than the one fixed throughput slide showing 9-QAT being between 4/5-CPU.

What does 18 or 22 on a QAT compare with on the CPU?

Only one power to performance ratio data point doesn't show how power usage scales with compression level used. Was that the best result for power/performance? Does it get better or worse with level used? Is this best result cherry picked from an in-depth benchmark?

Interesting, not very informative.

**StefanBruens** · 17 August 2023, 08:57 AM

> Intel also shared this slide, among others, but do note the differing in Zstd compression levels with their benchmarks:

The compression ratios are actually the same, just using different numbers for the levels. QAT ZStd level 9 achieves a compression ratio of 2.76, ZStd 1.5.5 has a ratio of 2.74 resp. 2.77.

**StefanBruens** · 17 August 2023, 09:32 AM

Originally posted by skeevy420 View Post

Testing just 9 with a QAT and 4 and 5 with a CPU was odd other than the one fixed throughput slide showing 9-QAT being between 4/5-CPU.

Its not "just 9", the linked article clearly states:

The following measurements were taken using a block size of 16KB with Intel QAT HW configured for its best compression ratio.

Trying to implement anything more complex (higher compresssion level) would be detrimental to throughput and power efficiency, while using more silicon area. The highest HW compression ratio is already slightly better than the default SW compression ratio (zstd SW defaults to level 3).

If you need higher compression, you have to use the SW implementation. You also have to live with a significantly higher compression time (orders of magnitudes slower), gaining just a little bit of compression.

The HW compression is meant for one of compression for e.g. data transmitted over network (hence the importance of latency mentioned), or short to medium time storage on disk.

**david-nk** · 17 August 2023, 09:50 AM

Great, it's nice to see an actually useful innovation.
Now if only the web people could get their shit together, zstd has been out for 7 years and browsers still do not support zstd compression for HTTP traffic...

**skeevy420** · 17 August 2023, 09:59 AM

Originally posted by StefanBruens View Post

Its not "just 9", the linked article clearly states:

Trying to implement anything more complex (higher compresssion level) would be detrimental to throughput and power efficiency, while using more silicon area. The highest HW compression ratio is already slightly better than the default SW compression ratio (zstd SW defaults to level 3).

I know, I'm still curious how QAT would have scaled instead of just getting the best results cherry picked.

If you need higher compression, you have to use the SW implementation. You also have to live with a significantly higher compression time (orders of magnitudes slower), gaining just a little bit of compression.

The HW compression is meant for one of compression for e.g. data transmitted over network (hence the importance of latency mentioned), or short to medium time storage on disk.

Honestly, when QATs go for ~$250 and up used, I just expected better than SW at 4 or 5 performance...because neither of those are useful for long-term data archival. That only seems like its useful for on-the-fly compression...which means it'd have been interesting to see how it compared to LZ4 and other supported QAT CODECs.

Also, all things considered, since accelerator cards cost hundreds to thousands of dollars, 90w of savings isn't all that much and can easily be offset and then some with solar and wind technologies...home and business. A $1000 accelerator card to save 90w vs $1000 in solar to gain 500w. Makes ya think.

**Anux** · 17 August 2023, 10:23 AM

Originally posted by skeevy420 View Post

What does 18 or 22 on a QAT compare with on the CPU?

They only implemented level 1 - 12 and those are not comparable to zstd levels ...

I wonder what QAT is in hardware? Is it some kind of FPGA?

**bezirg** · 17 August 2023, 11:17 AM

It sucks though that Intel has disabled QAT4 from their workstation sapphire rapids processors (aka WS).

I wish Intel would stop with this market segmentation already (See avx512, ecc, vroc/vmd/rst, optane pmem, etc).

**StefanBruens** · 17 August 2023, 12:40 PM

Originally posted by bezirg View Post

It sucks though that Intel has disabled QAT4 from their workstation sapphire rapids processors (aka WS).

I wish Intel would stop with this market segmentation already (See avx512, ecc, vroc/vmd/rst, optane pmem, etc).

There are several Xeon Scalable SKUs which may meet your requirements, if you don't need the high TurboBoost frequencies. See e.g. https://www.intel.com/content/www/us...,232390,233418

AVX segmentation will be significantly reduced when AVX10 becomes available. ECC segmentation is already reduced, many Raptor Lake SKUs (e.g. i5-13600) support ECC already.

Announcement

Intel QAT Adapted For Zstd To Provide Big Performance/Efficiency Wins

Intel QAT Adapted For Zstd To Provide Big Performance/Efficiency Wins

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment