Announcement

**ermo** · 15 March 2019, 04:17 PM

Taking the numbers in @andreano's post at face value, why would you ever NOT go with lz4?

Looking at e.g. zstd vs. lz4, you trade +33% better compression rate for +66.666...% more time spent compressing and a whopping +268% more time spent decompressing?

What am I missing here?

**sdack** · 15 March 2019, 04:35 PM

Originally posted by andreano View Post

In other words, can zstd beat lzo and lz4 in their own domain, namely speed: ...
Of course, the result will depend a bit on your test data.

The numbers you're listing are from lzbench on an i7, using a difficult to compress input file. To get an idea of the difficulty, so does zstd -1 compress source code files with a ratio of about 1:7, whereas here it achieves a ratio of less than 1:3. So this is definitely a completely different kind of input that's being compressed.

lzo-rle is being developed specifically for compressing memory pages where it's known that much of it contains zeros. They could have developed a zstd-rle or lz4-rle only they didn't. It's my guess that the added rle is of such an advantage that it outweighs the choice between lzo or lz4. A lz4-rle could possibly be better than the lzo-rle, but likely only marginal.

**andreano** · 15 March 2019, 05:52 PM

Originally posted by sdack View Post

completely different kind of input

Thanks for pointing that out, it was just a test I found. But I'm glad zram works with less compressible data too, when the OOM-killer is the next alternative.

**Weasel** · 16 March 2019, 08:50 AM

I don't know about you guys, but when I want to use compression to swap, it needs to compress fairly well. Otherwise why even use it at all? If you want maximum speed don't use compressed swap at all. (of course you'll run out of memory faster, you can't have it both ways)

In general you expect to swap stuff out that you won't need soon, so that you have more memory for other things (even caching). You can't make it too slow, though. I'm referring to those who think LZ4 is the solution here. But it doesn't compress so well.

**ipsirc** · 16 March 2019, 02:55 PM

Originally posted by mcloud View Post

Wish one could use zram for more things than just swap, pretty-much like windows 10

I'm using it as my rootfs.

**ermo** · 16 March 2019, 02:58 PM

Originally posted by Weasel View Post

I don't know about you guys, but when I want to use compression to swap, it needs to compress fairly well. Otherwise why even use it at all? If you want maximum speed don't use compressed swap at all. (of course you'll run out of memory faster, you can't have it both ways)

In general you expect to swap stuff out that you won't need soon, so that you have more memory for other things (even caching). You can't make it too slow, though. I'm referring to those who think LZ4 is the solution here. But it doesn't compress so well.

On the corpus referenced above, LZ4 compresses 2.101:1 while LZO compresses 2.108:1. That's not a very big difference, is it?

And depending on your swappiness setting, when you swap stuff out (or back in), isn't it usually because something needs more RAM right now? In that scenario, I'd argue that a +66% to +268% speed improvement trumps a measly .007 ratio increase in memory efficiency? Granted, the numbers might be different when it comes to swappable pages, but still...

I don't pretend to know the answer here -- I'm just asking whether I'm the only one who doesn't get why a hypothetical lz4 with the rle improvements isn't the obvious choice if indeed compression and decompression speed is a factor?

Does anyone here happen to have links where I can learn more?

EDIT: Yann Collet (the author of LZ4 and zstd) has stated the following re. the LZ4 specification:

Originally posted by Yann Collet

With matches autorised to overlap forward, it makes the equivalent of RLE (Run Length Encoding) for free, and even repeated 2-bytes / 4-bytes sequences, which are very common.

(source)

**sdack** · 16 March 2019, 03:42 PM

Originally posted by Weasel View Post

I don't know about you guys, but when I want to use compression to swap, it needs to compress fairly well. Otherwise why even use it at all? ...

The way it works currently is that it compresses 2 memory pages into 1 (ZBUD), or, 3 into 1 (Z3FOLD) depending on which option you have configured your kernel with. So any compression ratio beyond a 2:1 (or 3:1) is a waste of time currently and you want a fast algorithm instead.

The idea of using compression here does not necessarily mean to save space, but it can also mean to make swapping faster, because one may have a slow swap drive. When a swap drive can only store data at a rate of i.e. 100 MB/s and you use a compression algorithm, which achieves a 2:1 or 3:1 ratio at a speed of i.e. >500 MB/s then you'll be able to swap out data at a rate of 200-300 MB/s onto the 100 MB/s drive.

**Weasel** · 16 March 2019, 06:16 PM

Originally posted by ermo View Post

On the corpus referenced above, LZ4 compresses 2.101:1 while LZO compresses 2.108:1. That's not a very big difference, is it?

And depending on your swappiness setting, when you swap stuff out (or back in), isn't it usually because something needs more RAM right now? In that scenario, I'd argue that a +66% to +268% speed improvement trumps a measly .007 ratio increase in memory efficiency? Granted, the numbers might be different when it comes to swappable pages, but still...

I don't pretend to know the answer here -- I'm just asking whether I'm the only one who doesn't get why a hypothetical lz4 with the rle improvements isn't the obvious choice if indeed compression and decompression speed is a factor?

Does anyone here happen to have links where I can learn more?

swappiness controls the relative amount of swapping memory pages and dropping the cache. For example with 100 swappiness, memory and cache have same priority. With 0 swappiness, memory will never be swapped out in favor of cache, so the kernel will always drop cache first because it will never swap out any memory.

Typically, swap is used when you need more memory indeed, but the way it works is that it swaps out the least used pages to "store" them in case they are needed later. In many cases, what you swap isn't even going to come back (memory leaks, bloat, etc), but the kernel can't know that, so it can't discard it. Unless it's cache, in which case it can easily discard it (just like any other file-backed memory mapping) because, well, it can always re-read it from the file if it needs to. But it can't do that with memory in use by programs, so it has to swap it out. In this case, this is where ZRAM kicks in and compresses those pages that will be "stored for later".

But perhaps on a system that truly swaps out constantly, speed is more important. It depends on your use case. For me zstd makes more sense but it depends on your use case I guess. (I also set swappiness to 100 because of ZRAM, I wish I could set it to 150 or so, it's been discussed before but it wasn't mainlined, so that swapping memory has more priority than dropping the cache).

**Weasel** · 16 March 2019, 06:18 PM

Originally posted by sdack View Post

The way it works currently is that it compresses 2 memory pages into 1 (ZBUD), or, 3 into 1 (Z3FOLD) depending on which option you have configured your kernel with. So any compression ratio beyond a 2:1 (or 3:1) is a waste of time currently and you want a fast algorithm instead.

The idea of using compression here does not necessarily mean to save space, but it can also mean to make swapping faster, because one may have a slow swap drive. When a swap drive can only store data at a rate of i.e. 100 MB/s and you use a compression algorithm, which achieves a 2:1 or 3:1 ratio at a speed of i.e. >500 MB/s then you'll be able to swap out data at a rate of 200-300 MB/s onto the 100 MB/s drive.

But ZRAM is only for RAM though?

And zswap also, last I heard, writes uncompressed stuff to disk, for some idiotic reason (which can easily be denial of service'd).

**ermo** · 16 March 2019, 06:38 PM

Originally posted by Weasel View Post

swappiness controls the relative amount of swapping memory pages and dropping the cache. For example with 100 swappiness, memory and cache have same priority. With 0 swappiness, memory will never be swapped out in favor of cache, so the kernel will always drop cache first because it will never swap out any memory.

Typically, swap is used when you need more memory indeed, but the way it works is that it swaps out the least used pages to "store" them in case they are needed later. In many cases, what you swap isn't even going to come back (memory leaks, bloat, etc), but the kernel can't know that, so it can't discard it. Unless it's cache, in which case it can easily discard it (just like any other file-backed memory mapping) because, well, it can always re-read it from the file if it needs to. But it can't do that with memory in use by programs, so it has to swap it out. In this case, this is where ZRAM kicks in and compresses those pages that will be "stored for later".

But perhaps on a system that truly swaps out constantly, speed is more important. It depends on your use case. For me zstd makes more sense but it depends on your use case I guess. (I also set swappiness to 100 because of ZRAM, I wish I could set it to 150 or so, it's been discussed before but it wasn't mainlined, so that swapping memory has more priority than dropping the cache).

I have an old 4 PC Phenom II cluster on which I dabble with Exherbo. When I compile e.g. llvm-6 on my hexa-core 16 GB head node using the icecream distributed compiler to send jobs to the 3 agent nodes, the linking process (which runs on the head node) can easily go past 20 GiB RAM usage. So I use zswap (not ZRAM) with my normal 2GB swap partition and additionally enable an 8 GB swap file to be on the safe side. I think I run with a swappiness of 20-40 (can't quite recall off the cuff). In that specific scenario, zswap with LZ4 routinely saves me from the OOM killer. When I looked through the kernel source for the lz4 implementation, it looked as if the kernel didn't employ more than a single thread when using it with zswap, so I figured that lz4 was the best option for me for speed reasons on my relatively sedate hardware.

As an aside, compared to the stock kernel with 1000Hz and preemptive set, the -ck1 patchset with MuQSS *noticeably* improved latency and perceived smoothness under heavy compilation loads. Watching a YT video during compilation on the -ck1 kernel would be buttery smooth while it would visibly hitch on the stock 1000Hz + preempt kernel (both were using a SATA SSD and the multiqueue bfq I/O scheduler and the same compilation load). But the plural of anecdote obviously isn't data...

Announcement

ZRAM Will See Greater Performance On Linux 5.1 - It Changed Its Default Compressor

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment