Announcement

**hotaru** · 27 July 2020, 06:57 AM

Originally posted by intelfx View Post

Incorrect. Most of that code is scheduling, inter-processor communication and synchronization, which is absolutely required for anything multithreaded.

no, it's not. if there's only one thing running and the threads don't need to communicate with each other, there's no need for scheduling or inter-processor communication. and synchronization is as simple as just starting the decompression code on each core an them waiting until they're all done.

Originally posted by intelfx View Post

It's not there. It may be there on your unique workstation, but it's not there not on a typical Linux system. Typical Linux system does not have 26 cores.

so everyone else should be forced to have slower boot times because some people chose to buy a single-core potato instead of a modern system?

Originally posted by intelfx View Post

Do you even follow the context? I'm talking about compression, in context of your claim that "the bz2 file is smaller than the zst file".

bzip2 was faster at both compression and decompression. the difference in compression time was even bigger.

Originally posted by intelfx View Post

It's one of the slowest compression methods that are currently supported by the kernel. The only thing slower than bzip2 is lzma/xz, everything else is faster.

it's faster than lzma/xz, gzip, and zstd. the only one actually faster than bzip2 is lz4, but it also suffers from a horribly slow implementation in the kernel.

**hotaru** · 27 July 2020, 07:06 AM

Originally posted by gnulinux82

How to fail at benchmarking: the post.

how to fail at reading and understanding context: this post.

that post was in reply to another post that used the same methodology. I have done pretty extensive benchmarks of different compressors and decompressors with various settings, so I already knew that multithreaded bzip2 is usually faster than zstd on reasonably modern hardware. of course there are still systems around with ridiculously small numbers of CPU cores, but that's no excuse not to try to get the best performance we can get out of more capable hardware.

**hotaru** · 27 July 2020, 07:35 AM

Originally posted by gnulinux82

If you need 32-core parallel decompression before you can beat single-threaded Zstd, all that demonstrates is how bad the bzip2 algorithm is.

it also beats multithreaded zstd. if you use small enough blocks for multithreaded zstd decompression to match bzip2's speed, you end up with significantly worse compression ratio. zstd is ok for large files, but for files smaller than a few hundred MB, bzip2 usually wins on both speed and compression ratio.

Originally posted by gnulinux82

The vast majority of "modern" devices still don't have 32 cores.

that's not a good reason to artificially limit the performance of devices that do have a decent number of cores.

**microcode** · 27 July 2020, 01:07 PM

Originally posted by hotaru View Post

only if you limit decompression to a single thread. if you use multiple threads, bzip2 can be quite a bit faster than zstd.

Good luck starting up SMP before the kernel is even decompressed.

**microcode** · 27 July 2020, 01:09 PM

Originally posted by hotaru View Post

it also beats multithreaded zstd. if you use small enough blocks for multithreaded zstd decompression to match bzip2's speed, you end up with significantly worse compression ratio. zstd is ok for large files, but for files smaller than a few hundred MB, bzip2 usually wins on both speed and compression ratio.

That is simply not true. Look at any comparison (including the one linked in the article, however flawed) and you will find that zstd even at the highest ratios is only taking a couple nanoseconds per byte. There is no universe in which decompressing bzip2 is anywhere near as fast on any conventional computer as decompressing zstd. bzip2 is the quintessential unnecessarily slow decompressor. Even if you could get SMP it would get its ass handed to it.

And you aren't going to get SMP before the kernel is decompressed anyway, nor will you necessarily care to compress the kernel image on the kind of machine that has 32+ harts.

**hotaru** · 27 July 2020, 03:22 PM

Originally posted by gnulinux82

Who or what is "artificially limiting" anything, in the context of this use case?

the kernel's horribly slow bzip2 decompression code, and the people who want to remove bzip2 instead of improving the implementation.

Originally posted by microcode View Post

Good luck starting up SMP before the kernel is even decompressed.

I've witten software that used SMP without a kernel at all before. it's trivial to do when the different threads don't have to talk to each other and you don't have to worry about anything else wanting to use the CPU.

Originally posted by microcode View Post

There is no universe in which decompressing bzip2 is anywhere near as fast on any conventional computer as decompressing zstd. bzip2 is the quintessential unnecessarily slow decompressor. Even if you could get SMP it would get its ass handed to it.

in real world tests using kernel images, not only does bzip2 not get it's ass handed to it, it actually beats zstd on both compression ratio and decompression speed. I don't care about theoretical nanoseconds per byte. what I care about is how quickly my system can get to the point where it drops to a low-power sleep state and is just waiting for input. longer boot times use more power and increase the amount of time that the system isn't ready to do useful work.

Originally posted by microcode View Post

nor will you necessarily care to compress the kernel image on the kind of machine that has 32+ harts.

I don't care about compression speed for the kernel image, only decompression speed.

**arQon** · 27 July 2020, 03:55 PM

Originally posted by intelfx View Post

zstd -1 generally happens at I/O speed or faster, while being quite better than lzo.

Right, and that was my point - even low-power (Atom/etc) devices will have SSDs in them, but if you're stuck waiting for a dirt-slow algorithm (and there have been MUCH faster options since long before ztsd existed, like lz4) to repeatedly do single-threaded compression of an initramfs at 1.x GHz, you're going to have a bad time. :P

I can't tell from the docs whether initramfs.conf actually supports passing *arguments* to the compressor used or not, but now that this has tickled my curiosity I think I'll see what I can do. If arguments do work then even just using zip -1 rather than the default -6 should make a huge difference on those boxes.

**dc_coder_84** · 27 July 2020, 05:21 PM

What are the use cases of this? Does a regular Ubuntu installation benefit from this improvement?

**hotaru** · 27 July 2020, 05:25 PM

Originally posted by arQon View Post

I can't tell from the docs whether initramfs.conf actually supports passing *arguments* to the compressor used or not, but now that this has tickled my curiosity I think I'll see what I can do. If arguments do work then even just using zip -1 rather than the default -6 should make a huge difference on those boxes.

mkinitcpio does support passing arguments to the compressor, and I currently use "pigz -9" to compress my initramfs on machines with extremely slow storage (Raspberry Pi running from an SD card, for example) because the gzip decompressor is currently the fastest one in the kernel. lz4 should be faster, but unfortunately the implementation that decompresses the initramfs just isn't. decompressing an lz4 initramfs at boot time takes 4-6 times as long on a Raspberry Pi 3 than running "lz4 -d" on the same file after the system is up and running.

**hotaru** · 27 July 2020, 05:56 PM

Originally posted by gnulinux82

They'd be much better off improving multi-threaded Zstd, instead of trying to polish the turd that is bzip2. Comparing a heavily optimized, multi-threaded implementation of bzip2 and a poorly multi-threaded Zstd proves nothing. Especially given the absolutely terrible methodology used (whether you copied it from someone else or not).

the whole point was to show that it is terrible methodology, but even with that terrible methodology bzip2 still wins over zstd. I'm starting to wonder if you're actually trying to miss the point because I don't see how you could miss something so obvious without trying.

sure, they could try to focus on improving multi-threaded zstd, but the implementation would be a lot more complicated and much less likely to actually make it into the kernel. multi-threaded bzip2 is much simpler and already performs better than existing implementations of zstd. zstd is great for larger files and legacy or embedded systems with few CPU cores, but for this use case on modern systems with a decent number of cores, it's not the best option.

Announcement

Zstd-Compressed Linux Kernel Images Look Very Close To Mainline With Great Results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment