Announcement

Collapse
No announcement yet.

Patch Proposed For Removing BZIP2 Support From The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Patch Proposed For Removing BZIP2 Support From The Linux Kernel

    Phoronix: Patch Proposed For Removing BZIP2 Support From The Linux Kernel

    For at least a second time, a patch sent out under "request for comments" would strip out the existing BZIP2 code within the Linux kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Hopefully this makes it through. bzip2 was abandoned for a long time.

    For some bzip2 related laughs, https://youtu.be/yG1OZ69H_-o?t=2357

    Last I checked, that issue is still present in the current bzip2 version.

    Comment


    • #3
      bzip2 is either slower or larger than every other supported algorithm
      Slower? Depends, xz is not fast either. But larger? That highly depends on your data. We log CAN frames of sensor data. And thanks to its BWT bzip2 produces the smallest file sizes here (smaller than xz with highest compression setting). But it is slow, so we switched to zst nevertheless.

      Comment


      • #4
        Originally posted by oleid View Post

        And thanks to its BWT bzip2 produces the smallest file sizes here (smaller than xz with highest compression setting).
        Yeah, bzip2 is better than xz for small files like man pages

        Comment


        • #5
          Originally posted by Mangix View Post
          For some bzip2 related laughs, https://youtu.be/yG1OZ69H_-o?t=2357
          Thanks for that link it was very interesting, I had no idea things like would happen so easily.

          Comment


          • #6
            Not disputing the above comments, but they don't seem relevant?

            As the article says, the kernel's algorithms are used for (de)compressing a limited set of data -- most often the kernel binary itself and the initramfs, sometimes swap or tmpfs which are compressed in 1MB blocks. The characteristics are quite predictable and the benchmarks mentioned were run on representative examples.

            Comment


            • #7
              Originally posted by oleid View Post
              Slower? Depends, xz is not fast either.
              xz is faster than bzip2 on decompress
              Originally posted by oleid View Post
              That highly depends on your data. We log CAN frames of sensor data.
              and they compress kernel

              Comment


              • #8
                Bzip2 is used to compress SELinux modules. Not bad. Tested and lzip was best with hll files (binary representation) and brotli/bzip2 was best with cil (lisp like ascii format). But if I used zstd and trained format specific dictionaries, it was almost always best. And produced huge savings with hll and some savings with cil. So my guess is, zstd and properly trained dictionary is best with small files.

                And for any data which is uncompressed multiple times bzip2 is almost always bad choice if uncompression time matters even a little.

                Comment


                • #9
                  Originally posted by maage View Post

                  And for any data which is uncompressed multiple times bzip2 is almost always bad choice if uncompression time matters even a little.
                  For decompressing a big chunk of data xz is faster. But if you decompress multiple small files, the result may be different. Here are tests I performed several years ago:

                  Code:
                  $ time find man-bz2/ -type f -name "*.bz2" -exec bzcat '{}' > /dev/null \;
                  
                  real 0m35.895s
                  user 0m14.232s
                  sys 0m14.121s
                  
                  $ time find man-xz/ -type f -name "*.xz" -exec xzcat '{}' > /dev/null \;
                  
                  real 0m44.342s
                  user 0m16.842s
                  sys 0m21.459s
                  
                  $ time find man-bz2/ -type f -name "*.bz2" -exec bzcat '{}' > /dev/null \+
                  
                  real 0m10.096s
                  user 0m9.000s
                  sys 0m0.787s
                  
                  $ time find man-xz/ -type f -name "*.xz" -exec xzcat '{}' > /dev/null \+
                  
                  real 0m7.846s
                  user 0m7.108s
                  sys 0m0.487s

                  Comment


                  • #10
                    Originally posted by Mangix View Post
                    For some bzip2 related laughs, https://youtu.be/yG1OZ69H_-o?t=2357
                    I don't know assembly so I didn't understand why using signed instead of unsigned would be 2x more efficient:

                    uint8_t8 *block;
                    block[uint32_t] // much slower
                    block[int32_t] // much faster .. why?

                    both index numbers wrap when they get to their maximum values.. it's just that uint32_t has more precision, so what's the deal?

                    Comment

                    Working...
                    X