Announcement

Collapse
No announcement yet.

Zstd-Compressed Linux Kernel Images Look Very Close To Mainline With Great Results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by gnulinux82

    That's not how it works. If you use a shitty methodology then nothing "wins"; the whole benchmark is just invalid.
    so you are just a troll who's intentionally missing the point. anything you have to say is just invalid.

    Comment


    • #52
      hotaru you're really funny, I registered this account just for you.

      BTW I've seen discussion about the kernel lz4 been slower, probably also here, it's simple, kernel lz4 code has not been updated for quite a while. But for your case I'd suggest the file size needs to be considered since Pi SD I/O performance is horrible, probably even worse in boot stage, and lz4 with lower compression ratio suffered more, I'd still use lz4 though, since I also care about compression speed.

      Comment


      • #53
        Originally posted by hotaru View Post
        I've witten software that used SMP without a kernel at all before. it's trivial to do when the different threads don't have to talk to each other and you don't have to worry about anything else wanting to use the CPU.
        Great, I await your functioning patch for SMP decompression of kernel images, it is trivial after all, you say.

        Originally posted by hotaru View Post
        in real world tests using kernel images, not only does bzip2 not get it's ass handed to it, it actually beats zstd on both compression ratio and decompression speed. .
        I live in the real world and all of my kernel images on all of my computers decompress faster on those computers with zstd than with bzip2, I checked all of them (with every preset). If there's an issue with the way they're being invoked in the specific context of decompressing the kernel, I guess let our friends know on the mailing list, but otherwise I don't accept your basic premise.
        Last edited by microcode; 07-28-2020, 12:44 PM.

        Comment


        • #54
          Originally posted by hotaru View Post
          mkinitcpio does support passing arguments to the compressor, and I currently use "pigz -9" to compress my initramfs on machines with extremely slow storage
          Thanks: saves me having to try it.

          Originally posted by hotaru View Post
          the gzip decompressor is currently the fastest one in the kernel. lz4 should be faster, but unfortunately the implementation that decompresses the initramfs just isn't.
          wut?! That copy must be broken as hell then (as evidenced) by the "4-6 times as long after the system is up and running". And even then it seems extremely strange: I'd susptec something pathological in the surrounding conditions, because even a nearly-broken version of the algorithm shouldn't be anything LIKE that much slower...

          Comment


          • #55
            Originally posted by intelfx View Post
            Incorrect. Most of that code is scheduling, inter-processor communication and synchronization, which is absolutely required for anything multithreaded.
            It doesn't take many kB of code to get a simpler multitasking kernel running. When you don't need to care about priority queues etc, then it's quite simple. Most tiny microcontroller systems manages threaded code from their single core from maybe 4 kB kernel code. You don't need too much more to handle multiple x86 cores - no need in this case to maintain any fpu state, do individual page protection between the threads etc.

            Comment


            • #56
              Originally posted by hotaru View Post

              lz4 should be faster, but unfortunately the implementation that decompresses the initramfs just isn't. decompressing an lz4 initramfs at boot time takes 4-6 times as long on a Raspberry Pi 3 than running "lz4 -d" on the same file after the system is up and running.
              Thanks for pointing that out!

              I suspect the only change needed is to switch memcpy() with __builtin_memcpy(). LZ4 expects memcpy(dst, src, CONSTANT) to be inlined, but in a free-standing environment it can't be by default. When compiling for the kernel memcpy() gets inlined, but when compiling for kernel decompression it doesn't.

              Zstd had the same problem, but I fixed it with https://lkml.org/lkml/2020/7/30/974 by switching to __builtin_memcpy(). I'll measure, and put up a patch if it fixes it.

              Comment


              • #57
                Originally posted by terrelln View Post

                Thanks for pointing that out!

                I suspect the only change needed is to switch memcpy() with __builtin_memcpy(). LZ4 expects memcpy(dst, src, CONSTANT) to be inlined, but in a free-standing environment it can't be by default. When compiling for the kernel memcpy() gets inlined, but when compiling for kernel decompression it doesn't.

                Zstd had the same problem, but I fixed it with https://lkml.org/lkml/2020/7/30/974 by switching to __builtin_memcpy(). I'll measure, and put up a patch if it fixes it.
                https://lkml.org/lkml/2020/8/3/1143 fixes LZ4 kernel decompression speed which gets a ~10x speed up.

                However, I don't see any obvious problems with initramfs decompression speed. Note that decompressing the initramfs will be slower than lz4 decompression because it is building the filesystem during decompression, so the decompression time also includes the time to build the filesystem. If your initramfs has a lot of small files, the time to build the filesystem can dominate. If you wanted to measure initramfs decompression time only, you could create an initramfs that consists of only one large file, so the time spent building the filesystem is minimal.

                Comment

                Working...
                X