Linux 5.13 To Allow Zstd Compressed Modules, Zstd Update Pending With Faster Performance

birdie replied

05 May 2021, 10:31 AM
Originally posted by jabl View Post

Do any distros actually make use of module compression? gz and xz have apprently been there a while, and at least ubuntu seems to ship them uncompressed (of course the .deb package itself is compressed, so this just wastes a bit of disk space).

Same for firmwares, FWIW.

Fedora and RHEL (as well as RHEL derivatives) use XZ compression by default. It actually slows down the boot process quite significantly - hopefully Fedora will switch to ZSTD as soon as 5.13 gets released.
Likes 1
Leave a comment:
oleid replied

05 May 2021, 06:14 AM
Originally posted by jabl View Post

Do any distros actually make use of module compression?

I would have guesses it might be interesting for embedded distributions. But often squashfs is used there anyway, so no point in compressing modules.
Leave a comment:
DrYak replied

04 May 2021, 09:20 AM
Originally posted by timofonic View Post

Zstd is becoming more interesting these days. I hope some crazy geniuses are able to optimize it even more than currently is.

Zstd is the brain child of Yann Collet / Cyann.
Of LZ4, XXH64 and FSE fame.
Aka. the human being who can understand Jarek Duda's papers on ANS and turn them into code (above mentioned FSE).
That's crazy-genuis enough in my book.

Originally posted by Joe2021 View Post

What is the situation of compression ratio? Is it yet maxed out or can we hope for an increased compression ratio yield in the future?

That's pretty close the theoretical limit of what you can achieve, without also introducing some complex modelling of the input.
- Entropy coding is done with FSE (tANS-like) which is provably equivalent to Range- and Arithmetic- encoding, etc. i.e.: it hits the Shannon limit (unlike the Huffman trees of older compression methods like GZip's Deflate) but is much faster (boils down to a few bit twiddling and look-ups, no arithmetic involved).
- In slower modes, the dictionary search is supposedly exhaustive and close to optimal.

The only way one could improve beyond that is incorporating some modelling / machine learning on top of it. Which is what LZMA (remember that MA stands for MArkov) and PAQ do. But which is NOT what Zstd aims for (it aims for speed so ML is out of the question).

Originally posted by birdie View Post

I've followed quite a large number of compression algorithms over the past 25 years, and none of them have improved in terms of compression ratio more than 3% since its inception. Some have gained multithreaded compressions/decompression but that's it.

Yup, that's my opinion too. Sometimes some details can be tweaked a bit (more exhaustive dictionary search, better optimization of the entropy's tables).
But in Zstd we're already pretty close to optimum.

Originally posted by birdie View Post

I only wonder if zstd's `-22 --ultra --long` mode can be sped up (it's currently very very slow) but otherwise the algo is great.

Unless somebody has a revolutionary idea for a much better hashing / look-up algorithm - a much faster one while still close to optimal - that's going to be very hard to beat.
You're basically limited by the look-up time to search the dictionary space.

Originally posted by birdie View Post

The compression ratio is a tad lower than LZMA2's but the decompression speed is up to eight times higher.

That compression ratio even crazier once you factor in the fact that Zstd on purpose eschews any machine learning and relies entirely on a dictionary+entropy combo.
That's what the latest up-to-date research in information theory has enabled.
(and no surprise about the speed of LZMA, that's the cost at which machine learning/modelling comes).

Originally posted by rene View Post

This news is a bit misleading. Linux did support zstd compression before, as this is done in user space by kmod. This merely wires up compressing modules during the kernel build.

That's literary in the news article:

In user-space, KMOD 28 already supports dealing with Zstd-compressed modules.

I though commenting without actually RTFA was a /. -specific tradition :-D

Originally posted by rene View Post

This also is not as useful as it sounds, as modern file system with transparent compression have a similar benefit,

File-system compression is block-based in order to achieve random access.
e.g.: BTRFS compress stripes of 128KB,.
And even after XZ compression I have more than 130 modules larger than that in my distro. (Probably more than 350 uncompressed cover multiple stripes of 128KB, but I am too busy to measure this properly).
It's not the same result (it trades random read-write access vs. compression performance).

Also depending on the device, the bootloader might not be able to load from a partition that supports compression at all.
So usually your kernel is stuck on a peculiar partition format that the bootloader can understand, and together with it any further component needed to boot the system (be it initramfs or modules).

E.g.: Raspberry Pi up to 3 can only boot from FAT32 because the main ARM CPU is boot-strapped by GPU and the GPU can only understand FAT32. Also it doesn't support initramfs by default neither.
E.g.: EFI can only boot from FAT. That's why most Linux distribution put a boot loader there (e.g.: GRUB, ELILO, etc.) and then load the actual kernel and initramfs and extra boot-time modules from a proper Linux-y boot/root partition.
E.g.: the U-boot used on multiple Pine64 devices (e.g.: my Pinebook Pro) only understands FAT and EXT3/4 (without any compression extensions supported in EXT).

Also note that most distributions' kernel doesn't come with every file-system enabled in-kernel. Only a few are supported out-of-the-box and any other filesystem requires drivers to be loaded.

e.g.: most embed/ARM kernel only support ext4, f2fs, fat and udf out of the box.
on a distro that doesn't use initramfs like raspbian, it means that you can only exclusively boot on root partitions that are either ext4 (the default used by the system) or f2fs (not documented but supported).
Transparent compression isn't mainstream in Ext4 yet, and F2fs only added Zstd recently.
Same is also valid for some smartphone that block module loading over "safety" concerns.

So although in theory you could use BTRFS for your boot/root partition, turn compression on and call it a day, there are numerous use cases where this isn't practically doable.

Originally posted by rene View Post

and the distribution packages are smaller w/o kernel module compression as the thousand pack of them as a whole compresses way more than thousand compressed modules without any redundancy left.

In case the specific use of Zstd wasn't a giveaway already:
The main purpose here is speed. Zstd is one of the few ultra-fast compression, where the speed of decompression is faster than the storage bandwidth.
Individual ZStd files == loads much faster, no matter if the .RPM/.DEB is slightly bigger.

(And speaking of speed: decompressing the whole LZMA-compressed package, and then individually recompressing all the files as Zstd at max level in the filesystem would be a bit resource-intensive and time-consuming. One would need to balance the benefits/costs depending on the use case. Though it happens only once per update, so there might be some bandwidth constrained use-cases -- like updating over 2G / over Satellite / remote IoT sensors over LoRaWAN)

On the other hand, a initramfs compressed with Zstd and containing most of the needed modules would definitely benefit from whole-compression of uncompressed modules.
(I use that on my pine book pro: the u-boot doesn't support FS-compression, but the default kernel in Manjaro ARM supports Zstd compressed ram disks).
Likes 5
Leave a comment:
NateHubbard replied

03 May 2021, 10:41 PM
Originally posted by gigi View Post

only two
is there any particular reason?

One is fast, the other compresses a lot.
Likes 1
Leave a comment:
birdie replied

03 May 2021, 08:32 PM
Originally posted by ms178 View Post

Not with certainty, but if you've read the introduction of the newest patch set, it gives you an overview of the changes from the version used in the Kernel to the updated version and it shows significant improvements in the stated scenarios. It is an educated guess that similar improvements will materialize in future updates as well. Not necessarily specific to the compression ratio, but this could see further improvements, too.

What makes you so doubtful? It is an active project and the devs seem to improve upon all metrics each release.

I've followed quite a large number of compression algorithms over the past 25 years, and none of them have improved in terms of compression ratio more than 3% since its inception. Some have gained multithreaded compressions/decompression but that's it.

I only wonder if zstd's `-22 --ultra --long` mode can be sped up (it's currently very very slow) but otherwise the algo is great. The compression ratio is a tad lower than LZMA2's but the decompression speed is up to eight times higher.
Likes 1
Leave a comment:
evergreen replied

03 May 2021, 05:38 PM
Originally posted by Joe2021 View Post

What is the situation of compression ratio? Is it yet maxed out or can we hope for an increased compression ratio yield in the future?

Some progresses are on the way:

Recursive block splitting by senhuang42 · Pull Request #2447 · facebook/zstd

https://github.com/facebook/zstd/pull/2447

New experimental param, controlled by ZSTD_c_splitBlocks. Some rudimentary benchmarks (fastest of 4 runs with fullbench.c): (Decompression speed not measured, though I'd assume that to be a bit slo...
Likes 1
Leave a comment:
zxy_thf replied

03 May 2021, 05:36 PM
Originally posted by jabl View Post

Do any distros actually make use of module compression? gz and xz have apprently been there a while, and at least ubuntu seems to ship them uncompressed (of course the .deb package itself is compressed, so this just wastes a bit of disk space).

Same for firmwares, FWIW.

Can confirm both CentOS 7 and Fedora 33 use xz to compress loadable modules.
Their firmware files seem still uncompressed, though.

Last edited by zxy_thf; 04 May 2021, 08:00 AM.
Likes 1
Leave a comment:
rene replied

03 May 2021, 04:13 PM
This news is a bit misleading. Linux did support zstd compression before, as this is done in user space by kmod. This merely wires up compressing modules during the kernel build. This also is not as useful as it sounds, as modern file system with transparent compression have a similar benefit, and the distribution packages are smaller w/o kernel module compression as the thousand pack of them as a whole compresses way more than thousand coppressed modules without any redundancy left.
Likes 2
Leave a comment:
julemand101 replied

03 May 2021, 03:40 PM
Originally posted by jabl View Post

Do any distros actually make use of module compression? gz and xz have apprently been there a while, and at least ubuntu seems to ship them uncompressed (of course the .deb package itself is compressed, so this just wastes a bit of disk space).

Same for firmwares, FWIW.

Arch Linux uses xz compression of modules. You can see the list of files in the "linux" package here:

Arch Linux - linux 6.8.7.arch1-1 (x86_64)

https://archlinux.org/packages/core/x86_64/linux/

Where module files ends with ko.xz.
Likes 3
Leave a comment:
jabl replied

03 May 2021, 03:09 PM
Do any distros actually make use of module compression? gz and xz have apprently been there a while, and at least ubuntu seems to ship them uncompressed (of course the .deb package itself is compressed, so this just wastes a bit of disk space).

Same for firmwares, FWIW.
Likes 1
Leave a comment:

Announcement

Linux 5.13 To Allow Zstd Compressed Modules, Zstd Update Pending With Faster Performance

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: