Announcement

Collapse
No announcement yet.

Ubuntu 19.10 To Boot Faster Thanks To LZ4 Compression

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    This is great. I'll save up to seconds every six months when I reboot. I don't know what I will do with all the extra free time... maybe blink or something.

    Comment


    • #22
      Originally posted by skeevy420 View Post

      On my PC with spinning drives, ZSTD does make a difference over XZ, but barely noticeable but enough to mention it because it could matter for really low end devices.

      ZSTD compared to LZ4 on boot, can't tell a difference between the two until I look at raw numbers from benchmarks.

      It's how well ZSTD compresses at the really high --fast modes that is really peaking my interests. My ramdisks would love that.
      Now i did actually true it
      With initramfs set to raw (no compression) it took 2 seconds.
      With initramfs set to lz4 it took 1.8 seconds.
      Total boot time (from the moment the bootmanager spins up this stuff) is ~4 seconds till login.

      It actually was ~8 seconds a few minutes ago, but turns out that there is a "networkmanager wait online" service that blocks and is totally useless in my setup.

      Still, nearly 2 seconds for the kernel to boot sounds like there is lots of room there to get it down to around 500ms if i were to recompile it and throw all the stuff out that i don't need. But, again, i'm not as adventurous to do that sub 5 seconds is fast enough, i can wait for that :P

      Comment


      • #23
        Afaik, LZ4 decompression speed does not depend on compression level. So at maximum compression level LZ4 can achieve Zstd ratio and outperform it in decompression time/speed. Best fit.

        Comment


        • #24
          Originally posted by AlB80 View Post
          Afaik, LZ4 decompression speed does not depend on compression level. So at maximum compression level LZ4 can achieve Zstd ratio and outperform it in decompression time/speed. Best fit.
          1) LZ4 decompression speed depends on compression level. It sounds funny, but it comes to how decode works and how it interacts with compression levels and what compression level really means. This said, dependence isn't that sharp and overall it stays in same speed league. But it is possible to lose "some" ratio in favor of fast decompression speed if speed all that matters. If someone curious: find "lzbench" project on github and give it a try, varying levels, data or whatever (Lzbench benchmarks ton of various LZ derivatives, ofc LZ4 is included). If I remember there're even some experimental LZ4 encoders that intentionally optimizing compressed data for best possible decompression speed, even if it would cost some ratio. As rule of thumb, LZ4 decompression speed slows down as ratio increases. Yet most of time we can live with it, since it still fast. But sweet spot between read time reduction and decompression time increase isn't set in stone, it is moving target, specific

          2) LZ4 generally can't achieve ZSTD-like ratios for few major reasons.
          • First, zstd brings entropy coding, being 2-phase scheme. This alone can give fair gain in achieved ratio - but doing decompression "twice" inevitably kills decoding speed. Zstd got rather good job at minimizing this impact, however no way you can fully negate the fact decoding algo is more complicated and needs more processing.
          • Second, LZ4 got rather small dictionary, offsets encoding and so on. It wasn't really meant to compress large contigious chunks of data in efficient manner. So it can't reference far-away data. So even if these are redundant, LZ4 wouldn't benefit from it, failing to spot it and replace repeated occurence with mere reference to previous data. This kills ratio. But somewhat improves speed, memory use and keeps decoder simple. This is one of tradeoffs you have to make when designing compression algo. LZ4 author been all about speed - even if he have to pay for it a price. Most obvious price for compression algo is ratio it can achieve. Zstd got larger dictionary and it surely helps to improve ratios on something of typical Linux kernel size.
          • Third, LZ4 bytestream format isn't terribly efficient when it comes to achieving compression ratio. And especially doing so on "large" chunk of data of several megabytes or more. Say, from compression point of view, "decompressed" Linux kernel contains funny areas. Say, there is about a ~megabyte of zeroes. Just funny block of zeroes, megabyte long or so. Not terribly intelligent data? It likely some "initialized variables" or so, all of them set to zero I guess. Sure, this area is very redundant - over fairly large scale. No way LZ4 can encode this in efficient fashion, nor it has been designed to do it. You can look what it does - and ensure it looks quite redundant even after compression. However, such tradeoff keeps decoder simple and therefore fast. So I'd say there is room for improvement.
          TL;DR: LZ4 and Zstd play in different "speed vs ratio" leagues. As simple as that. LZ4 is so simple to decode and fast people even ported LZ4 decompression to e.g. Z80/speccy. Hopefully it gives some clue of LZ4 decompression properties.

          And now there're funny edge cases. Say, reading from CDs can be ridiculously slow. Same for some strange computers vs some usb sticks at early boot time (some odd bios/uefi vs usb interaction?). It can be as bad as maybe 1 megabyte per second or so eventually. At which point "live" or "install" medias/environments risk getting far slower to boot (even shittiest prehistoric PCs can decompress e.g. zstd faster than 1Mb/second, at which point improved ratio saves overall boot time). So if canonical would use LZ4 for e.g. live session images it can do quite the reverse for some configurations.

          p.s. when it comes to compression kernel we generally want good ratio and fast decompression speed, but generally care much less about compression speed (after all kernel is compressed once per release). Ah, isn't it logical the more compressor chews on data, the more redundancy it can spot - and eliminate? That's how "compression levels" created. On fast levels compressor gives up early, maybe failing to find best match but handing out result faster, etc.
          Last edited by SystemCrasher; 09-11-2019, 12:16 AM.

          Comment


          • #25
            Originally posted by Chaython View Post
            So this is a non-transparent compression? How does the system know to decompress the kernel?
            The kernel includes an (uncompressed) entry point, which decompresses the rest of the kernel.

            Originally posted by Chaython View Post
            How does this compare to LZX in NTFS?
            Totally different things.

            Comment


            • #26
              So /boot will run out of space 25% faster now. I see what you did there Ubuntu devs.

              Good thing I have 25gb for /boot now..

              Comment


              • #27
                Originally posted by bregma View Post
                This is great. I'll save up to seconds every six months when I reboot. I don't know what I will do with all the extra free time... maybe blink or something.
                Meanwhile I'll waste hours trying to fix a broken boot after a system update.

                Comment


                • #28
                  Originally posted by Spacefish View Post
                  So /boot will run out of space 25% faster now. I see what you did there Ubuntu devs. Good thing I have 25gb for /boot now..
                  Lol, my whole system, with some CADs, ton of cross-compilers, office suite, video player, torrent client, several advanced graphics programs and so on occupies just 15Gb. What the hell you plan to /boot? Has grub finally grown into full-fledged OS?

                  Comment


                  • #29
                    Originally posted by SystemCrasher View Post
                    Has grub finally grown into full-fledged OS?
                    They seem to be working towards that:
                    https://www.gnu.org/software/grub/ma...grub.html#play

                    I'm expecting GRUB3 will have mouse support and GRUB4 will have a full web browser. :P

                    I am interested in the trade-off between gzip and lz4 when the kernel is transferred over TFTP. They seems to be assuming at least 5400 RPM HD transfer speeds which should be around 100 MB/s (800 Mbps)? Most network cards provide a sad implementation of TFTP for PXE booting. Even if both the client and server are connected via gigabit NICs, PXE/TFTP does not get nearly the same transfer rate as a slow hard drive. It would be nice if everyone was on systems set to use UEFI for network booting at this point, but I don't think everyone is there yet. I am curious to know if Ubuntu 19.10's kernel image for PXE will also be lz4 or they will make an exception for that kernel.

                    Comment


                    • #30
                      Originally posted by chilinux View Post
                      I'm expecting GRUB3 will have mouse support and GRUB4 will have a full web browser. :P
                      Sounds very reasonable. Especially granted uefi eventually comes with mouse support. So grub3 got a semi-competitor. Though it can cheat, being "uefi program".

                      I am interested in the trade-off between gzip and lz4 when the kernel is transferred over TFTP. They seems to be assuming at least 5400 RPM HD transfer speeds which should be around 100 MB/s (800 Mbps)?
                      Achievable speed depends on many factors. HDDs only perform good if it long linear reads - otherwise speed crashes badly. So it have to be 1 computer booting in parallel and/or host enjoying RAM that is large enough to "working set" buffered.

                      Then, if I remember, TFTP wasn't designed with speed in mind, only simplicity. Wasn't big problem at low networking speeds, where round-trip time isn't big deal - most time spent sending data packets rather than waiting for ACKs. But at >=Gigabit speeds things could be different - sending packets is quite fast, so waiting for ack can get a very sizeable % of time, since it 2 network stacks + data path on the way, it could be a lot of time compared to what data packet takes "in wire" at gigabit speed. If I remember TFTP wouldn't xmit next data until it gets ACK for previous one.

                      Most network cards provide a sad implementation of TFTP for PXE booting. Even if both the client and server are connected via gigabit NICs, PXE/TFTP does not get nearly the same transfer rate as a slow hard drive.
                      TFTP has been designed to be simple. At the end of day, NICs historically had just some small ROM with rather trivial boot program invoked by BIOS, it wasn't that smartass.

                      At which point gzip can eventually win due to smaller amount of data to transfer, especially if target got powerful CPU. However it depends on configuration details. Zstd could be interesting option, since it both compresses even better than gzip (thanks to larger dictionary, etc) and on most x86 HW it decompresses faster than gzip (and in worst case it could be about of gzip speed, on e.g. simple non-OoO ARM, etc - and even here it tends to be somewhat faster). Recent kernels got zstd self-decompression support if I remember - and it looks like some tradeoff to consider, especially if building own kernel, etc.

                      LZ4 is obviois winner on SSDs and so on. These are so fast read time could be small compared to decompression, especially if CPU is relatively slow and SSD is fast. However exact details depend on how these speeds compare in particular system. It isn't something that firmly set in stone, it depends on configuration details. I think I even seen some compression benchmark at least trying to calculate "winning" algo taking both transfer speed and decompression speed into account. Maybe it was lzturbo's turbobench (also on github). However benchmarks could only have synthetic coarse approximation as they don't really take into account other things (filesystem, fragmentation, resulting overhead, ... ).

                      So speaking for myself, when I get curious, I just try this, measure, then try that and measure, and eventually chose what works best in parcular situation. Though measuring small times could get rather complicated.
                      Last edited by SystemCrasher; 09-20-2019, 03:08 PM.

                      Comment

                      Working...
                      X