Announcement

Collapse
No announcement yet.

Now That The Linux Kernel Can Be Zstd-Compressed, The Next Step Is The Firmware

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by bug77 View Post
    And the question remains: is the embedded CPU able to uncompress faster than reading the whole thing from storage?
    I can't speak for all embedded as it's a massive field, but for network devices (routers/wifi/NAS) CPUs from 4-5 years ago can decompress faster than IO reads from raw flash. It will be probably maxed out, but it's not doing anything else during boot anyway so who cares.

    Decompressing that 2MB kernel takes at most one second while reading it takes 2-4 seconds depending on device. This can be seen if you connect the (debug) serial console of the device, as the bootloader will print what it is doing.

    Comment


    • #32
      Originally posted by jabl View Post
      I don't understand this whining about why bother when the benefits are small. Linux today is a pretty mature kernel, the time for major improvements is largely long gone, except for specialized circumstances. OTOH even a minor improvement multiplied over millions of deployed systems make for a large aggregate amount of time & energy saved.

      Particularly this case of using zstd for compressing firmwares, considering the zstd code is already in the kernel for other reasons.
      It's not about "why bother". I was just asking if anyone has ever measured the benefits because I have never seen a single benchmark. Sorry if that sounded like whining, it was not my intention.

      Comment


      • #33
        Originally posted by bug77 View Post

        It's not about "why bother". I was just asking if anyone has ever measured the benefits because I have never seen a single benchmark. Sorry if that sounded like whining, it was not my intention.
        Well, if you click on the first link in the article you can see some timing results that facebook posted comparing their zstd compressed kernel + initrd with xz compressed.

        (Now, I suspect xz is not a good general purpose solution due to the slow speed, but maybe facebook boots their server farms over some slow 1 GbE management network, in which case optimizing for a slow I/O speed makes sense, particularly if they boot many servers in parallel in which case the BW of the provisioning server becomes the bottleneck)

        But if you take the size of the kernel image with different compressors from
        https://lore.kernel.org/lkml/CA+icZUUXHXXC9C47mZd1JamVnvZhpru-GWmgHQMERF7Y3AQKgw@mail.gmail.com/T/#m959171bd6428dd9575e0b7f21f7ec6b34dea199d

        and the decompression speeds from https://github.com/facebook/zstd (heck, lets halve those speeds since those numbers are for a i9 CPU @ 5 GHz which sounds like substantially faster than a typical laptop or server core), and plot the load time (I/O + decompression) you end up with a curve like in https://ibb.co/PxwfSxv

        (apologies for the loglog scale, otherwise the crossover points are impossible to see)

        So basically the fastest results as function of I/O speed are

        [0, 130 MB/s] : zstd
        [130 MB/s, 1.5 GB/s]: lz4
        [1.5 GB/s, inf]: uncompressed

        Also consider that in the limit of infinitely fast I/O, zstd costs about 40 ms, and lz4 costs about 15 ms, whereas on the lower end of the spectrum, say, at 10 MB/s the difference between the fastest (zstd) and the slowest (uncompressed) is about 2.4s. Suggesting that if you're a distro deciding on a general purpose solution for your users, you should bias the choice towards the lower end. Particularly, as has been mentioned in this thread, that boot loaders often use very simplistic I/O which is unable to drive the hw to its limit.

        Comment

        Working...
        X