Announcement

Collapse
No announcement yet.

ZRAM Will See Greater Performance On Linux 5.1 - It Changed Its Default Compressor

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ZRAM Will See Greater Performance On Linux 5.1 - It Changed Its Default Compressor

    Phoronix: ZRAM Will See Greater Performance On Linux 5.1 - It Changed Its Default Compressor

    For those relying upon ZRAM to provide a compressed block device in RAM for cases like using it for SWAP or /tmp, with Linux 5.1 you might find it performing better than earlier kernels...

    http://www.phoronix.com/scan.php?pag...Better-Perform

  • daverodgman
    replied
    Originally posted by andreano View Post

    I'm amazed to see that zstd in its fastest setting almost keeps up with these special-purpose fast compressors, and actually manages to beat regular lzo in decompression! We don't have the numbers for lzo-rle, and it's hard to extrapolate 30% from regular lzo, since we don't know how much comes from compression and decompression, but assuming it's a pure decompression speedup (since that's what you get by making the algorithm more complex), that would be upwards of 60%, and a close race between lzo-rle and zstd on the decompression side. However, nothing that would dethrone lz4 as the bilateral speed king. Of course, the result will depend a bit on your test data.
    Actually the perf benefits were split between compression and decompression (it's much faster to detect a run of zeros than run through the lzo compression loop). I don't have the data to hand, but roundtrip perf ends up being a win over lz4 if I remember rightly, as the benefits to improving the slowest part (compression) have more impact than the fastest part (decompression) - because we spend more time on the slowest part. Zram does about 2.25x more compression than decompression (some pages are never decompressed again), so this also skews the importance to compression.

    Dave

    Leave a comment:


  • StuartIanNaylor
    replied
    Originally posted by StuartIanNaylor View Post

    Its highly likely it was the poor zram-config script you where using as nothing kernel wise has changed much since 3.14.

    There has been this zram-config script in the wild for over 7 years that for some reason has been emulated and copied blindly with no actual reference to the kernel docs or thought as it would seem.
    It pointlessly takes half the current ram available and partitions that so number of zram swaps equals cpu cores. Pointlessly reducing the max page write into the reciprocal of core count of 50% of mem.
    Its pure voodoo and bad voodoo as since 3.15 zram has been multistream and zramctl will display that. They end up with the same number of devices as cores all supporting the core count in streams but each swap is reduced as divided by the core count.

    That is only the start of how poor the zram-config script is from Ubuntu and its absolutely amazing as it seems to of been emulated and copied to practically every distro.
    Try my rough hacks https://github.com/StuartIanNaylor/ with zram yeah my scripting aint up to much but at least I bothered to read the kernel docs.
    Whoops should say 3.15 as streams / mem_limit and various updates where added, but since then relatively static but the scripts haven't changed since 3.14

    Leave a comment:


  • StuartIanNaylor
    replied
    Originally posted by Licaon View Post
    I've tried to use ZRAM on my first gen RPi1 w/ 256Mb, you'd image stuff will fill RAM pretty fast, and when it works it works.

    But in the end I had to disable it since, even using a small size, it would end up OOPSing the kernel out of the blue.

    Hope they improved it since 4.14.
    Its highly likely it was the poor zram-config script you where using as nothing kernel wise has changed much since 3.14.

    There has been this zram-config script in the wild for over 7 years that for some reason has been emulated and copied blindly with no actual reference to the kernel docs or thought as it would seem.
    It pointlessly takes half the current ram available and partitions that so number of zram swaps equals cpu cores. Pointlessly reducing the max page write into the reciprocal of core count of 50% of mem.
    Its pure voodoo and bad voodoo as since 3.15 zram has been multistream and zramctl will display that. They end up with the same number of devices as cores all supporting the core count in streams but each swap is reduced as divided by the core count.

    That is only the start of how poor the zram-config script is from Ubuntu and its absolutely amazing as it seems to of been emulated and copied to practically every distro.
    Try my rough hacks https://github.com/StuartIanNaylor/ with zram yeah my scripting aint up to much but at least I bothered to read the kernel docs.

    Leave a comment:


  • StuartIanNaylor
    replied
    Originally posted by mcloud View Post
    Wish one could use zram for more things than just swap, pretty-much like windows 10
    You can use zram for far more than just swaps unfortunately the main distro utils for configure suck babdly.
    Both ubuntu & debians zram config tools should really be called zram_make_some_illogical_swaps_whilst_overwriting_ existing_zram_devices.

    Have a look at https://github.com/StuartIanNaylor/zram-config or my other repos as did some examples due to my frustration to the ignorance of https://www.kernel.org/doc/Documenta...ckdev/zram.txt
    The same bad script has been emulated and copied for about 8 years now and not one of them has bothered to take the time to actually read the kernel documentation valid since 3.15.

    I have been using LZ4 with the raspberry pi and the results are a lot closer than benchmarks state with with little difference to lzo on Arm you may find LZO-RLE maybe even faster as presume that is why it has been made default.

    I have been working off this list and for swap its LZO or LZ4, deflate or Zstd which is not in /proc/crypto give excellent up 200% the text compression ratio of LZ, but currently the only offering is a choice of LZ or deflate zlib as the others are inferior or not in the kernel.

    | Compressor name | Ratio | Compression | Decompress. |
    |------------------------|----------|-------------|-------------|
    |zstd 1.3.4 -1 | 2.877 | 470 MB/s | 1380 MB/s |
    |zlib 1.2.11 -1 | 2.743 | 110 MB/s | 400 MB/s |
    |brotli 1.0.2 -0 | 2.701 | 410 MB/s | 430 MB/s |
    |quicklz 1.5.0 -1 | 2.238 | 550 MB/s | 710 MB/s |
    |lzo1x 2.09 -1 | 2.108 | 650 MB/s | 830 MB/s |
    |lz4 1.8.1 | 2.101 | 750 MB/s | 3700 MB/s |
    |snappy 1.1.4 | 2.091 | 530 MB/s | 1800 MB/s |
    |lzf 3.6 -1 | 2.077 | 400 MB/s | 860 MB/s |

    It would be great if someone could do some benchmarks on system load via zram. I have been using my zram-conf and just #swap commenting out the swap line of ztab for no zram.
    When you switch to zram you change from the assumption of hdd swap which the static defaults are set at and swapiness can be approx 80-100 rather than the default 80.
    The page-cluster should also be set to zero as negating the hdd tuned cache of 8 page writes will help greatly reduce latency.

    There is a LibreOffice spreadsheet with runs of 15 mins logged every 2secs of /proc/loadavg with no zram, zram-sw80-pc3, zram-sw80-pc0, zram-sw100-pc3, zram-sw100-pc0 in https://github.com/StuartIanNaylor/z...ap-performance

    If your load is medium or lower then zram can be pushed to swapiness 100 and the pi3b+ generally is up to most tasks with zram maximised even if the boot process queue takes a hit because of the extra zram overhead.
    Its a shame swapiness isn't dynamic and I did an extremely crude swapiness loadbalancer in the https://github.com/StuartIanNaylor/z...-load-balancer as its the problem with static set points as swapiness is often reduced due to the intense boot / startup period as it increases process queue with zram.
    If you get past that period and load is moderately normal then zram is unnoticeable and the extra ram provided comes into play of swapiness=100 but you end up with a compromise between the 2 of somewhere between 80-100 depending on how well the CPU copes with load.
    You can try this on a pi zero as then its pretty drastic but you can see the same curve happening with later.

    Leave a comment:


  • Licaon
    replied
    I've tried to use ZRAM on my first gen RPi1 w/ 256Mb, you'd image stuff will fill RAM pretty fast, and when it works it works.

    But in the end I had to disable it since, even using a small size, it would end up OOPSing the kernel out of the blue.

    Hope they improved it since 4.14.

    Leave a comment:


  • sdack
    replied
    Originally posted by Weasel View Post
    But ZRAM is only for RAM though?

    And zswap also, last I heard, writes uncompressed stuff to disk, for some idiotic reason (which can easily be denial of service'd).
    Yes. ZRAM and ZSWAP are basically the same kernel feature, and hooks into the virtual memory management. ZRAM is for when you have no swap drive and want to store the compressed memory pages in system RAM. ZRAM then acts as an in-memory swap drive. ZSWAP is like ZRAM, but for when you do have a dedicated swap drive. There it stores the compressed memory pages on an HDD or SSD without creating an in-memory swap drive.
    Last edited by sdack; 03-16-2019, 07:17 PM.

    Leave a comment:


  • ermo
    replied
    Originally posted by Weasel View Post
    swappiness controls the relative amount of swapping memory pages and dropping the cache. For example with 100 swappiness, memory and cache have same priority. With 0 swappiness, memory will never be swapped out in favor of cache, so the kernel will always drop cache first because it will never swap out any memory.

    Typically, swap is used when you need more memory indeed, but the way it works is that it swaps out the least used pages to "store" them in case they are needed later. In many cases, what you swap isn't even going to come back (memory leaks, bloat, etc), but the kernel can't know that, so it can't discard it. Unless it's cache, in which case it can easily discard it (just like any other file-backed memory mapping) because, well, it can always re-read it from the file if it needs to. But it can't do that with memory in use by programs, so it has to swap it out. In this case, this is where ZRAM kicks in and compresses those pages that will be "stored for later".

    But perhaps on a system that truly swaps out constantly, speed is more important. It depends on your use case. For me zstd makes more sense but it depends on your use case I guess. (I also set swappiness to 100 because of ZRAM, I wish I could set it to 150 or so, it's been discussed before but it wasn't mainlined, so that swapping memory has more priority than dropping the cache).
    I have an old 4 PC Phenom II cluster on which I dabble with Exherbo. When I compile e.g. llvm-6 on my hexa-core 16 GB head node using the icecream distributed compiler to send jobs to the 3 agent nodes, the linking process (which runs on the head node) can easily go past 20 GiB RAM usage. So I use zswap (not ZRAM) with my normal 2GB swap partition and additionally enable an 8 GB swap file to be on the safe side. I think I run with a swappiness of 20-40 (can't quite recall off the cuff). In that specific scenario, zswap with LZ4 routinely saves me from the OOM killer. When I looked through the kernel source for the lz4 implementation, it looked as if the kernel didn't employ more than a single thread when using it with zswap, so I figured that lz4 was the best option for me for speed reasons on my relatively sedate hardware.

    As an aside, compared to the stock kernel with 1000Hz and preemptive set, the -ck1 patchset with MuQSS *noticeably* improved latency and perceived smoothness under heavy compilation loads. Watching a YT video during compilation on the -ck1 kernel would be buttery smooth while it would visibly hitch on the stock 1000Hz + preempt kernel (both were using a SATA SSD and the multiqueue bfq I/O scheduler and the same compilation load). But the plural of anecdote obviously isn't data...
    Last edited by ermo; 03-16-2019, 06:42 PM.

    Leave a comment:


  • Weasel
    replied
    Originally posted by sdack View Post
    The way it works currently is that it compresses 2 memory pages into 1 (ZBUD), or, 3 into 1 (Z3FOLD) depending on which option you have configured your kernel with. So any compression ratio beyond a 2:1 (or 3:1) is a waste of time currently and you want a fast algorithm instead.

    The idea of using compression here does not necessarily mean to save space, but it can also mean to make swapping faster, because one may have a slow swap drive. When a swap drive can only store data at a rate of i.e. 100 MB/s and you use a compression algorithm, which achieves a 2:1 or 3:1 ratio at a speed of i.e. >500 MB/s then you'll be able to swap out data at a rate of 200-300 MB/s onto the 100 MB/s drive.
    But ZRAM is only for RAM though?

    And zswap also, last I heard, writes uncompressed stuff to disk, for some idiotic reason (which can easily be denial of service'd).

    Leave a comment:


  • Weasel
    replied
    Originally posted by ermo View Post
    On the corpus referenced above, LZ4 compresses 2.101:1 while LZO compresses 2.108:1. That's not a very big difference, is it?

    And depending on your swappiness setting, when you swap stuff out (or back in), isn't it usually because something needs more RAM right now? In that scenario, I'd argue that a +66% to +268% speed improvement trumps a measly .007 ratio increase in memory efficiency? Granted, the numbers might be different when it comes to swappable pages, but still...

    I don't pretend to know the answer here -- I'm just asking whether I'm the only one who doesn't get why a hypothetical lz4 with the rle improvements isn't the obvious choice if indeed compression and decompression speed is a factor?

    Does anyone here happen to have links where I can learn more?
    swappiness controls the relative amount of swapping memory pages and dropping the cache. For example with 100 swappiness, memory and cache have same priority. With 0 swappiness, memory will never be swapped out in favor of cache, so the kernel will always drop cache first because it will never swap out any memory.

    Typically, swap is used when you need more memory indeed, but the way it works is that it swaps out the least used pages to "store" them in case they are needed later. In many cases, what you swap isn't even going to come back (memory leaks, bloat, etc), but the kernel can't know that, so it can't discard it. Unless it's cache, in which case it can easily discard it (just like any other file-backed memory mapping) because, well, it can always re-read it from the file if it needs to. But it can't do that with memory in use by programs, so it has to swap it out. In this case, this is where ZRAM kicks in and compresses those pages that will be "stored for later".

    But perhaps on a system that truly swaps out constantly, speed is more important. It depends on your use case. For me zstd makes more sense but it depends on your use case I guess. (I also set swappiness to 100 because of ZRAM, I wish I could set it to 150 or so, it's been discussed before but it wasn't mainlined, so that swapping memory has more priority than dropping the cache).

    Leave a comment:

Working...
X