Modernized Zstd Merged Into Linux 5.16 For Much Greater Performance - Phoronix Forums

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

X

Old Grouch

Senior Member

Join Date: Apr 2020

Posts: 492
- Share
- Tweet
#11

14 November 2021, 07:42 AM

Given the usefulness of Zstd, is there an argument for incorporating hardware acceleration for Zstd on CPU dies, in much the same way as the hardware acceleration for encryption?

TUDelft: FPGA Acceleration of Zstd Compression Algorithm
Likes 5
Comment
peterdk

Senior Member

Join Date: Feb 2020

Posts: 149
- Share
- Tweet
#12

14 November 2021, 08:04 AM

Originally posted by Old Grouch View Post

Given the usefulness of Zstd, is there an argument for incorporating hardware acceleration for Zstd on CPU dies, in much the same way as the hardware acceleration for encryption?

Isn't the usefulness of ZSTD that it is very efficient on modern CPUs due to hardware specific optimisations in the code? It runs already very fast.
Likes 1
Comment
skeevy420

Senior Member

Join Date: May 2017

Posts: 7548
- Share
- Tweet
#13

14 November 2021, 08:52 AM

Originally posted by peterdk View Post

Isn't the usefulness of ZSTD that it is very efficient on modern CPUs due to hardware specific optimisations in the code? It runs already very fast.

Wouldn't it be nice if you could, pulling numbers, compress with Zstd:15 with the speed of Zstd:3? That's where hardware specific optimizations would come into play.

After a certain level, varies by drive type, Zstd loses its effectiveness and becomes the bottleneck. It's around 2-5 for older SSDs, 7-12 for HDDs, and you may have to dip into Zstd-fast for NVMe. Hardware acceleration of Zstd would allow higher levels before the bottleneck occurs.
Likes 7
Comment
sdack

Senior Member

Join Date: Mar 2011

Posts: 1635
- Share
- Tweet
#14

14 November 2021, 08:53 AM

Patching the kernel like Oprah Winfrey: you get a win ... and you get a win ... and you get a win ... Everyone gets a win!
Likes 2
Comment
brucethemoose

Senior Member

Join Date: Sep 2021

Posts: 548
- Share
- Tweet
#15

14 November 2021, 10:13 AM

Originally posted by Old Grouch View Post

Given the usefulness of Zstd, is there an argument for incorporating hardware acceleration for Zstd on CPU dies, in much the same way as the hardware acceleration for encryption?

TUDelft: FPGA Acceleration of Zstd Compression Algorithm

Its not standardized/popular enough, I don't think. At least not yet.

But that approach works quite well in the PS5/Xbox.
Comment
rmfx

Senior Member

Join Date: Jan 2019

Posts: 577
- Share
- Tweet
#16

14 November 2021, 11:38 AM

Why is zstd preferred over lz4 ? True question.
lz4 performance still seems unmatched and the compression ratio isn’t that low compared to zstd.
Also, the oodle algos seem even better than all the above, I wish epic free the code at some point.

Anyway, it’s quite disappointing to discover that such important kernel shared api are so badly updated.
Likes 2
Comment
Black_Fox

Phoronix Member

Join Date: Mar 2015

Posts: 83
- Share
- Tweet
#17

14 November 2021, 12:19 PM

Originally posted by rmfx View Post

Why is zstd preferred over lz4 ? True question.
lz4 performance still seems unmatched and the compression ratio isn’t that low compared to zstd.

AFAIK LZ4 is only unmatched in absolute decompression speed. It offers much less flexibility than Zstd does and I consider it "fast and weak" compressor even with the highest compressing setting (looking through some benchmarks, LZ4 gets to compression ratio of 2.0, Zstd up to 4.0; that is a huge difference). There's LZ4HC that is slower and better compressing, but probably not as good as Zstd's equivalent presets. Zstd's range starts at LZ4's performance (with worse decompression speed, yes) and reaches up into the area of ridiculously slow compression, but very good ratio and relatively unchanged decompression speed. There's something like 22+5 compression presets, then there's --long and --ultra, dictionary support, multithreading support, extensive API and more. I noticed mention of dictionaries in LZ4 help, the rest I'm not so sure about.

TL;DR: both are by the same author, Zstd is newer and offers much wider speed/ratio range. LZ4 probably stays relevant in the highest-bandwidth niche.
Likes 6
Comment
DrYak

Senior Member

Join Date: Apr 2008

Posts: 975
- Share
- Tweet
#18

14 November 2021, 01:29 PM

Originally posted by rmfx View Post

Why is zstd preferred over lz4 ? True question.

BTW, both are from the same author, Yann Collet (CYann).

LZ4 is, as its name indicates only a single Lempel-Ziv dictionnary stage. It doesn't do anything beyon dictionnary searches.
The only tweaks being that it relies only on 4 bit nibbles (so no complicated bit-twiddling to decode the compresed stream) and clever coding to optimize modern hardware (avoid conditional jumps that would stall the CPU's pipeline).
It has two different compression search methods (that both generate compatible output, there's only a single decompressor): Either a fast one (normal LZ4) or a more exhaustive one (LZ4HC) that is guaranteed to find the best match increasing compression at the cost of CPU time during compression time (but slightly faster decompression given that even less compressed stream need to be read and decompressed for the same data output).

Zstd takes the same general concepts as LZ4, but adds an entropy coding stage to it. But instead of going for some old technique as the Huffman tree used in most old-school algorithms (like the Deflate used in Gzip, zip, etc.), it goes for a mush more modern ASN (Asymmetric Numeral Systems, by Jarek Duda) (basically, if you squint at it, a sort of range encodre but with reversed bit orders), more precisely a table-based one (tANS) that can be represented extremely efficiently as a finite-state-encode, which for each input yield an output+a remainder to loop back together with the next input. (CYann's blog has a lot of details of how he developed this). The end result is something which is much better than Huffman coding (which only approaches Shannon limit) and similar to performance like Range encoders and Arithmetic encoders (at the Shannon limit) but much faster implementation (after all, a Arithmetic encoder needs to be run for every single bit of the input stream, whereas FSE boils down to a single look-up per symbol).

Zstd is also a scrict superset of LZ4, some of the lower compression mode eschew the entropy coding and thus boil down to exactly what LZ4 does. (If I remember correctly, even the bitstream is compatible: you can actually decode chunks of LZ4 packets in those modes).

Originally posted by rmfx View Post

lz4 performance still seems unmatched and the compression ratio isn’t that low compared to zstd.

It isn't strictly true.

Zstd being a strict superset of LZ4 (it began it's life as an attempt at doing LZ4+FSE), you can simply use which compression mode of Zstd maps exactly to LZ4 and obtain quite similar results (save for the stream format into which the compressed LZ4 packets are packaged).

But in general, yes: If you're considering an extremely fast storage (RAM, or ultra fast NVMe), going for either LZ4 or equivalent Zstd mode will be unmatched, simply because you're skipping a whole stage (the entropy coding that is specific to higher modes of Zstd), while at the same time the medium is fast enough that the larger/less-compressed bitstream isn't that noticeable.

Things start to get different if you consider slower storage devices (e.g.: still a fast CPU, but a slower eMMC storage: there the extra entropy coding won't take that much CPU time, but the much smaller/more-compressed bitstream will take faster to transfer over the I/O bandwidth). Even more so with a Written-seldom/Read-frequently scenario, e.g.: such as an Initrd or a kernel: You only need to perform the slow "max level" Zstd compression once, but then each read will benefit from the speed of relatively compact image to decompress. Depending on the exact balance between CPU speed vs. medium I/O bandwidth, it might be the best choice.

Of course, if you're going to boot a Linux kernel on an AMD Threadripper CPU, of a floppy drive or over a serial PPP network, LZMA is what might be your best bet.

Originally posted by rmfx View Post

Also, the oodle algos seem even better than all the above, I wish epic free the code at some point.

Oodle is closed source. Means also the large community of crazy compression experts (such as CYann) don't get to play with it and make it better. Hence my "meh" opinion of it.

TOUGH!

It seems that some enterprising coder have reverse engineered and made a compatible opensource compressor and decompressor.

Also, Zstd isn't the last word in terms of compression, neither. It's mostly the best at what we can do in (relatively) simple compressor, same category as venerable Gzip. There are other way to go even better, at the cost of more CPU. It mostly involve putting some layer of AI/ML in the prediction of symbols before entroy coding.
e.g. LZMA uses a Markov model (hence its name), PAQ relies on neural nets.
Likes 11
Comment
sdack

Senior Member

Join Date: Mar 2011

Posts: 1635
- Share
- Tweet
#19

14 November 2021, 01:36 PM

Originally posted by rmfx View Post

Why is zstd preferred over lz4 ? True question.
lz4 performance still seems unmatched and the compression ratio isn’t that low compared to zstd. ...

It depends on the use case.

Zstd is great for file compression and where you intentionally want to trade speed versus space. The better the compression, the more space you have left.

For a RAM disk is Zstd not that great, but it can be for someone who wants to squeeze the most RAM out of their system. But you are right that here you probably want LZ4 instead.

For compressing the kernel image and initramdisk and to boot up faster can either LZ4 or Zstd be useful. A slow HDD, or even slower memory card (i.e. microSD), benefits from a Zstd compressed kernel & initramdisk because the smaller these are the faster they load and this can outweigh the decompression time. Booting from a fast SSD will likely put LZ4 ahead, but the margin between the two will be small.
Likes 2
Comment
ashetha

Junior Member

Join Date: Sep 2021

Posts: 6
- Share
- Tweet
#20

14 November 2021, 02:09 PM

I have zstd-compressed f2fs root. Will I still benefit from this (at least for compression) or do I have to reinstall using the latest 5.16-rc1?
Likes 2
Comment

Previous 1 2 3 4 template Next

Working...

X