Announcement

**skeevy420** · 14 November 2021, 03:07 PM

Originally posted by sdack View Post

It depends on the use case.

Zstd is great for file compression and where you intentionally want to trade speed versus space. The better the compression, the more space you have left.

For a RAM disk is Zstd not that great, but it can be for someone who wants to squeeze the most RAM out of their system. But you are right that here you probably want LZ4 instead.

For compressing the kernel image and initramdisk and to boot up faster can either LZ4 or Zstd be useful. A slow HDD, or even slower memory card (i.e. microSD), benefits from a Zstd compressed kernel & initramdisk because the smaller these are the faster they load and this can outweigh the decompression time. Booting from a fast SSD will likely put LZ4 ahead, but the margin between the two will be small.

That's due to kernel and configuration limitations, not Zstd itself. Zstd has both an LZ4 mode and various fast levels. With kernel images I was getting LZ4 speeds and ratios with Zstd-fast:1000 back with Zstd 1.4.0. If both ramdisks and high speed disks could be configured to use the faster Zstd levels they'd be a little more competitive with LZ4. OpenZFS supports some Zstd fast levels while, AFAIK, none of the in-kernel file systems or ramdisks do.

**brucethemoose** · 14 November 2021, 03:36 PM

Originally posted by skeevy420 View Post

That's due to kernel and configuration limitations, not Zstd itself. Zstd has both an LZ4 mode and various fast levels. With kernel images I was getting LZ4 speeds and ratios with Zstd-fast:1000 back with Zstd 1.4.0. If both ramdisks and high speed disks could be configured to use the faster Zstd levels they'd be a little more competitive with LZ4. OpenZFS supports some Zstd fast levels while, AFAIK, none of the in-kernel file systems or ramdisks do.

I don't know, I think zstd decoding is just fundamentally more complex than LZ4, even at the minimum level.

EDIT: F2FS and btrfs both support zstd level 1, if thats what you mean. IDK about the LZ4 modes.

**sdack** · 14 November 2021, 03:48 PM

Originally posted by skeevy420 View Post

That's due to kernel and configuration limitations ...

... And in-kernel Zstd compression happens to be the topic of the article. I only answered rmfx's question with regards to the article and who also already knew some of the differences himself. There was no further need for me to create tangents of what-ifs and did-you-know, when two previous comments explained much of the wider details already. Sorry if you expected to get a wall of text from me.

**dominikoeo** · 14 November 2021, 04:58 PM

sdack wrote:
> Zstd being a strict superset of LZ4 (it began it's life as an attempt at doing LZ4+FSE), you can simply use which compression mode of Zstd maps exactly to LZ4 and obtain quite similar results

This is not true. LZ4 and zstd have their pros and cons. Which one to use depends on use cases. The main benefit of LZ4 is that is is much faster at decompression than zstd. LZ4 decompression speed is unmatched by any other algorithms. So LZ4 is useful for mostly read-only data where decompression speed matters more than compression speed and when we can sacrifice a bit of decompression ratio for speed. The zstd github page https://github.com/facebook/zstd has benchmarks of zstd vs LZ4 among other algorithms. Snippet of the benchmark there:

Columns: algorithm, compression ratio, compression speed MB/s, decompression speed MB/s

zstd 1.4.5 -1	2.884	500 MB/s	1660 MB/s
zlib 1.2.11 -1	2.743	90 MB/s	400 MB/s
brotli 1.0.7 -0	2.703	400 MB/s	450 MB/s
zstd 1.4.5 --fast=1	2.434	570 MB/s	2200 MB/s
zstd 1.4.5 --fast=3	2.312	640 MB/s	2300 MB/s
quicklz 1.5.0 -1	2.238	560 MB/s	710 MB/s
zstd 1.4.5 --fast=5	2.178	700 MB/s	2420 MB/s
lzo1x 2.10 -1	2.106	690 MB/s	820 MB/s
lz4 1.9.2	2.101	740 MB/s	4530 MB/s
zstd 1.4.5 --fast=7	2.096	750 MB/s	2480 MB/s
lzf 3.6 -1	2.077	410 MB/s	860 MB/s
snappy 1.1.8	2.073	560 MB/s	1790 MB/s

Some remarks:

benchmarks do not use the latest version of zstd & LZ4
benchmarks are on x86_64. zstd algorithm does not perform as well on 32-bits architecture. I recall bench-marking on armv7 (years ago) and zstd was slower than zlib at decompression, whereas on x86_64 zstd was much faster than zlib.
LZHC can improve compression ratio over LZ4 (at the cost of slower compression). LZHC is compatible with LZ4 i.e. LZ4 decompressors can read it. This is useful when decompression speed does not matter e.g. when data is compressed once but read many times, which is a frequent use case. LZ4 decompression is even slightly faster when data was compressed with LZ4HC than with LZ4.

**sdack** · 14 November 2021, 05:18 PM

Originally posted by dominikoeo View Post

sdack wrote:
> Zstd being a strict superset of LZ4 (it began it's life as an attempt at doing LZ4+FSE), you can simply use which compression mode of Zstd maps exactly to LZ4 and obtain quite similar results ...

I did not write this, sorry.

**sdack** · 14 November 2021, 07:23 PM

Originally posted by ashetha View Post

I have zstd-compressed f2fs root. Will I still benefit from this (at least for compression) or do I have to reinstall using the latest 5.16-rc1?

Yes, you will benefit from this, and no, you do not need to reinstall.

Some of the changes improve compression speeds and compression ratio. If you want to benefit from those as well then you obviously will have to recompress the data, meaning, you will have to reinstall them. However, there are also improvements to the speed of decompression (I am assuming this is what you meant) and for those will you not need to reinstall the files nor has the data format changed.

For a more detailed list of changes since 1.3.1 to 1.4 / 1.5 see here:

zstd/CHANGELOG at dev · facebook/zstd

https://github.com/facebook/zstd/blob/dev/CHANGELOG

Zstandard - Fast real-time compression algorithm. Contribute to facebook/zstd development by creating an account on GitHub.

**cl333r** · 14 November 2021, 08:30 PM

Originally posted by brucethemoose View Post

I don't know, I think zstd decoding is just fundamentally more complex than LZ4, even at the minimum level.

Zstd was developed buy a guy who also developed another famous compression algorithm, I forget which one, perhaps LZ4.

**arQon** · 14 November 2021, 09:29 PM

Originally posted by Old Grouch View Post

Given the usefulness of Zstd, is there an argument for incorporating hardware acceleration for Zstd on CPU dies, in much the same way as the hardware acceleration for encryption?

Short answer: no.
Longer answer: no, for multiple reasons...

Dedicated cipher blocks have value because the bulk of the work is repeated operations on proportionally-small amount of data - only moderately complex ops, but often with very large loop counts - so basically you're bound by the speed you can perform those calcs far more than you are by anything else, like RAM or IO. Compression is almost the exact opposite scenario, where pretty much *all* you're doing (outside of final-stage entropy coding) is trawling through RAM. How quickly you can do math is basically irrelevant.

Ciphers are also constant - rigidly so, obviously, because it doesn't matter how clever your XYZ algorithm is, it HAS to produce the same hashes as the reference or it's literally useless. On top of that, the actual operations tend to be trivial, so once you make that block you know it's never changing. Compression algorithms tend to be much more dynamic, so even if you can avoid breaking the datastream format as you extend the algorithm over time, you're almost guaranteed to not be able to actually implement ALL of the algorithm as time passes and things evolve. By the time a compression format actually becomes stable, it's typically obsolete.

So yeah: on paper, it sort-of looks like it might sort-of be a good idea, because compression does burn a TON of CPU - but none of that work is really suited to an FPGA or a discrete block, because all you're actually doing is chewing through RAM/cache anyway. The only scenario in which it has any value is if you're doing SO much compression that you're basically tying up cores that could potentially be doing "real" work instead (ignoring the cache thrash issues), and there aren't many of those. It *might* be worth it for e.g. Google Docs or something like that, where you know you're going to have a lot of high-entropy data, and you're operating at a scale where you're buying enough hardware for the extra cost of the block to be worth it, and you're used to throwing away working hardware every couple of years, but that and "generic" data warehousing services like Backblaze are about the only scenarios I can come up with for it.

Doesn't mean someone won't do it anyway, but at a consumer level and even on a general-purpose basis it's definitely not worth the silicon it would waste.

**geearf** · 15 November 2021, 12:46 AM

Originally posted by sdack View Post

If you want to benefit from those as well then you obviously will have to recompress the data, meaning, you will have to reinstall them.

You can recompress as part of running defrag, it may be easier than reinstalling.

It does seem that waiting for 1.5.x may be better before recompressing a lot of data, but who knows how long that may take.

**sdack** · 15 November 2021, 08:24 AM

Originally posted by geearf View Post

You can recompress as part of running defrag, it may be easier than reinstalling.

It does seem that waiting for 1.5.x may be better before recompressing a lot of data, but who knows how long that may take.

Good to know, but just as a warning that the compression gains only apply to some of the compression levels and are also only minor. He mentioned F2FS so I am assuming he is using an SSD. I would then not rewrite/defragment the files when these are stored on an SSD, because it shortens the SSD's lifespan. Better to let the distro update the files under root "naturally" little by little.

ashetha : Kernel developers have a policy to not make a kernel incompatible with previous versions when there is a way to avoid it. So one can be sure that any changes to Zstd or a file system will not render old files unreadable suddenly. They would rather implement it side by side and allow the original to persist for as long as possible.

Announcement

Modernized Zstd Merged Into Linux 5.16 For Much Greater Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment