Announcement

Collapse
No announcement yet.

LZ4 Compression Support Is Unlikely For Btrfs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Beherit View Post
    I agree with the btrfs team that it's not justifiable to alter the entire disk format just to fit one single new compression algorithm into it. So what about altering it by making it modular?
    Btrfs on-disk formats allows to plug yet another compressor without major changes in btrfs structures, though there is limited number of compressors it can use, like 256 or so. But its only part of job to be done.

    What they mean is: imagine GRUB reading btrfs to load kernel and RAM drive. Older GRUBs would hit "unknown compression algo" issue when trying to read LZ4-compressed blocks. Systems using older GRUB and other boot loaders relying on fetching kernels, ramdisks and so on from filesystem would just break. Same goes for older filesystem tools. Failing to boot system does not sounds cool, right?

    And what justifies these woes? If we take a look around (I gave very extensive testing to both LZ4 and LZO and many others): LZ4 usually compresses a bit worse than LZO, decompresses faster than LZO (on x86 it can be noticeably faster to decompress in some cases, but depends on data, and e.g. on ARMs speed is almost the same). So my overall impression is that LZ4 is in same class like LZO in btrfs compression. With slightly less ratio, exchanged for even more speed. Overall it is nice tradeoff and makes a lot of sense e.g. for zram, where compression and decompressions speed should be comparable to RAM. But it makes much less sense for btrfs.

    P.S. as for "successors" of LZ4, it turns out there is LZ5, based on LZ4 code but heavily reworked. It compresses much better, and in terms or ratio beats both LZ4 and LZO to the dust. Its decompress speed is usually around LZO. But it shines on large chunks of data. Filesystems use small blocks to allow random access, so they would not benefit much from things like this. But it can look good for other uses. E.g. compressing kernel or ramdisk itself.

    Comment


    • #22
      More problems:
      - GRUB is far from the only bootloader that needs to be able to extract kernel and initrd files from /boot. There's a whole bestiary of alternative options.
      - The situation is enterily different from a XZ packed *kernel and initrd* (vs. a LZ4 compressed *filesystem*): the bootloader doesn't give a damn about those, it load the data as-is from the disk, and it's the kernel itself which the depacking. Same goes for ram and swap compressor.

      Modular is okay for kernel/initrd and swap packing (handled by the kernel only), modular is harder for a file system (every single tool needs to handle the same modules. That include a whole zoo of kernel loader, and several filesystem tools).

      Luckily:
      - as some pointed out, compressin in BTRFS is on a per-file basis.
      - in addition to that, only file that are actually compressible get compressed. (If the first 64~256k - don't remember exactly how much is the size of a compressed block - don't produce a smaller size, the file is considered not very much compressible. Only the first few k of a kernel - its depacking loader - are compressible. The kernel and the initrd would fail the test and not be compressed.)
      - some extended attributide (beyond the standard -c / +c) could be used to flags files against compression.

      But somebody would need to write the code for that and especially, would need to MAINTAIN the code for that.

      Comment


      • #23
        Originally posted by SystemCrasher View Post
        What they mean is: imagine GRUB reading btrfs to load kernel and RAM drive. Older GRUBs would hit "unknown compression algo" issue when trying to read LZ4-compressed blocks. Systems using older GRUB and other boot loaders relying on fetching kernels, ramdisks and so on from filesystem would just break. Same goes for older filesystem tools. Failing to boot system does not sounds cool, right?
        Whilst all your arguments are valid (not just the quoted part above), none of them did stop the btrfs team from adding LZO compression back when zlib was the only one available.

        Backwards compatibility is always an issue, I acknowledge and respect that. But if someone would choose to use LZ4 compression in a newer btrfs version (would it become available), I can't think of many plausible real life scenarios happening that would lead to someone trying to boot the same drive using an older version.

        The same argument could had been used for not including btrfs support in the Linux kernel years ago, claiming it would break compatibility with older kernels only supporting ext2/3/4.

        Comment


        • #24
          Originally posted by Beherit View Post
          Whilst all your arguments are valid (not just the quoted part above), none of them did stop the btrfs team from adding LZO compression back when zlib was the only one available.
          LZO and zlib are fundamentally different things. They play in different leagues and my experience in compression suggests it is unwise to compare zlib against LZ4 and LZO. It is more or less compare LZO vs LZ4. But zlib is really different in all kinds of properties. Zlib targets "medium" compression. It would not beat state of art heavy trucks like LZMA, but it gives noticeable better ratio than LZO or LZ4 any day. But there is price: it is slower to compress and decompress (but noticealby faster than e.g. LZMA).

          In compression there is no free lunch, there're some tradeoffs, and it is hard to get around. From technical standpoint, both LZ4 and LZO can be seen as plain Lempel-Ziv things, they both output streams classified as "byte-aligned LZ" schemes. They different in some details and tradeoffs, but core idea is quite similar. This approach gives modest compression ratios, but allows turbo-fast decompression (and compression, especally if you're okay about losing some part of compression ratio in exchange for speed).

          OTOH, Zlib runs LZ first and then Huffman-encodes it. This is two-phase scheme, it can't reach LZ4/LZO speeds because it does more things in first place. Both on compression and decompression. Though for fast compression, there're some ways to cheat. Zlib streams allow plenty of options, and e.g. SLZ thing cheated a bit to give super-fast compressoin in zlib-compatible way, they do LZ but using some smartass trick to save on Huffman phase. At cost of compression ratio. But zlib decompressor should always be prepared to handle all zlib options. So it can't be fast.

          So unless you've got very powerful CPU and slow storage, zlib decompression can easily get CPU-bound. That's where you stop being happy about better ratio and starting to swear about slowdowns. LZ4 and LZO are fundamentally different tradeoff. They can be so fast it could happen storage compression increases performance. Since they are light on CPU, it often happens speed increases. Because you have to read/write less data, and modern CPUs rarely have problem to exceed storage speed if it as lightweight as LZO or LZ4. Especially true for reading operations. SSDs can get too fast even for these beasts, but in this case you get more space and less wear due to smaller amount of writes. It is up you to decide which tradeoff to prefer.

          Bringing LZ4 to complement LZO makes much less sense, because they both play in same league and behavior difference is waaaay smaller than zlib vs lzo. It can perform somewhat better, etc. But one can think these are two slightly different tunings of quite similar schemes. E.g. one can get even faster LZO compression at cost of some ratio loss, being even more similar to LZ4 in this regard. Actually, Marcus F.X.J. Oberhumer has posted patch doing it some time ago in LKML. Not sure if it was commited, those interested in it can check source themselves and even patch it if they think they like their tradeoff over defaults. Yeah, there is possiblilty to get better ratio at cost of compression speed (decompression is not affected). Or better speed at cost of ratio. But only to some degree.

          [quote]Backwards compatibility is always an issue, I acknowledge and respect that. But if someone would choose to use LZ4 compression in a newer btrfs version (would it become available), I can't think of many plausible real life scenarios happening that would lead to someone trying to boot the same drive using an older version.[/quite]
          Bringing (LZ4 or LZO) to complement Zlib makes some sense. Because zlib is one league, and (LZO and LZ4) are different league. So they complement each other. Zlib is for cases where you care about space, and not being inclined on speed. LZO and LZ4 are good for complementary idea: "at least some compression" while being blazing fast. You can't easily tweak zlib to get speed comparable to LZO/LZ4 things, nor you can afford compression ratio of zlib in simple LZ things.

          The same argument could had been used for not including btrfs support in the Linux kernel years ago, claiming it would break compatibility with older kernels only supporting ext2/3/4.
          And actually, if you'll bring "yet another filesystem" without any distinct properties, you can easily face "inconvenient" question like "what benefits it would provide?". Btrfs devs have quite long list of advanced features as answer. And unless you can do something like this, I would not bet it going to be accepted.

          ...and ZFS used LZ4 because LZO is GPL and CDDL isn't compatible with it. So they can't use LZO, as far as I understand. That was deliberate decision of Sun to be GPL-incompatible, to hurt Linux in favor of Solaris. Yet it mostly backfierd on Sun itself. So they used some LZJB thnig developed inside of Sun. I've gave it a try and can admit both LZ4 and LZO are beating Sun crap to the dust in terms of compression vs speed tradeoff, while formally LZJB seems to be in same league as LZ4/LZO i.e. simplistic LZ. But far worse compression and far worse speed, somehow. But it is hallmark of Sun to roll out some mediocre solution and pretend it superb by using loud marketing instead of advanced techs.

          Comment


          • #25
            Originally posted by Beherit View Post
            The same argument could had been used for not including btrfs support in the Linux kernel years ago, claiming it would break compatibility with older kernels only supporting ext2/3/4.
            Not exactly. The same argument could be had for adding new feature to a file system which is officially supported to boot from.
            Not only would you need to modify Linux to add your new feature, but you would need to update also all the zoo of bootloaders, all the various tools (partition editor, filesystem checker, etc.)

            Its not adding something new, it's modifying something established, while needing to make sure that everyone else follows it, otherwise you risk having very serious problems (unbootable system, or even worse: corrupted data).

            It's for the same reasons why ext4 and ext3 have followed ext2 as separate filesystems instead of being just extra features:
            - though a ext2-designed 3rd party software could be accessing a ext3 partition (albeit losing the advantage of journal)
            - a ext4 with extents CANNOT be accessed from a ext2/3 3rd party software.
            Thus creating ext4 as a separate filesystem help making stuff clear, and nobody will complain while installation of disto X can't access or corrupts the data used by installation of distro Y.

            Comment


            • #26
              I would rate LZO, LZ4 and zlib all in a different league anyway, first it seems anyone always ignore that where LZO decompress 700MB/s LZ4 decompress 2000MB/s with only 1% increased space penalty. So just figure out the 1st reason why ZFS has optioned for LZ4...
              In SSD storage or embedded/low-power processor I see LZ4 clearly crucial.
              And also an important sidenote, while LZO is symmetrical in (de)compression speed, zlib and LZ4 are interesting in the "compress one / decompress many" scenario.
              zlib does easily 250MB/s *per core* in decompression with a modern CPU regardless of the data compression level (1-9).

              Comment


              • #27
                Slight update on this: It appears BTRFS may be getting zstd support. Which is overall superior(IMO) to lz4 for most storage devices. Superior compression, but slightly worse decompression speeds, still will keep up with the read speed of a regular SSD.


                Zstandard - Fast real-time compression algorithm. Contribute to facebook/zstd development by creating an account on GitHub.

                Comment


                • #28
                  LZ4 is extremely useful for ARM systems and has significant benefits over LZO and ZSTD. It should be re-evaluated because it matters on < 1GHz ARM 32-bit and 64-bit systems with limited memory.

                  Comment


                  • #29
                    i noticed David Sterba the btrfs maintainer rubber stamped the latest LKML request last week with this faq link, or basically echoed it and someone else linked the lz4 faq--

                    This debate is not the same as in 2014 when the faq item was added

                    * Two other linux kernel filesystems AND the zImage loader AND the compressed swap features now have kernel support for lz4.

                    * zstd lists a benchmark that is an absolute endorsement for lz4 decompression speed like 4x zstd and way above everything else.

                    * blocksize and compression ratios are also not concrete limitations of the present lz4 on github, or the one in my distribution.


                    in summary, the recommendation to use lzo because "its good enough" now basically conflicts with zstd's own README reccomendation, to simplify life between lzma, zstd, and lzo4 depending on your IO use cases.

                    Comment

                    Working...
                    X