Announcement

Collapse
No announcement yet.

Fedora Workstation 34 Looking To Employ Btrfs Zstd Transparent Compression By Default

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by AndyChow View Post
    Forcing compress means any file or archive which is non-compressible, will be compressed, which can often actually result in a larger file than the original. For example any lzma or 7z file you have will end up being compressed in zstd, which will often result in a larger file than the original. Compressing a compressed file doesn't result in magic.
    I don't think that's true with zstd, as it also checks if the file is compressible to not waste space.
    I've just tried with an iso, and the same iso compressed with xz. The result is as expected no zstd used for the xz compressed one unlike the original file.
    Your logic may apply to LZO/gzip though.
    edit: Well out of curiosity I tried with zlib and lzo, and the same happened.
    Last edited by geearf; 31 December 2020, 08:22 AM.

    Comment


    • #32
      Originally posted by AndyChow View Post
      Forcing compress means any file or archive which is non-compressible, will be compressed, which can often actually result in a larger file than the original.
      No, two-fold checks with Btrfs; 1) zstd implementation in Btrfs has its own heuristics and will not send stuff that didn't compress, and 2), Btrfs will not store compressed extents unless they are at least 4KiB smaller after compression.

      Comment


      • #33
        Originally posted by intelfx View Post
        The only actual problem with compress-force is that it forces btrfs to break files into 128 KiB (or so) extents, because this is the maximum size of a compressed extent. If several extents in a row end up rejected by the encoder and stored uncompressed, they will not be "fused". This leads to metadata bloat and very high fragmentation levels.
        Bullshit. The extents are anyway going to be 128 KiB and the same metadata bloat and fragmentation affects all compression-enabled partitions including compressible data, not just the data that is incompressible. So if you don't view this fragmentation as an issue with compressed data, it hardly is an issue with non-compressed data. (My feel is that the incompressible data would have to present a majority of the data on the partition in order to be considered problematic.)

        Comment


        • #34
          Originally posted by AndyChow View Post

          My understanding is that when just compress is used, btrfs checks the first block is compressible, and if not, skips compressing the whole file. If you have things like virtual disks in raw, this can trigger false-negatives, i.e. the file could be compressed efficiently, but won't be. Forcing compress means any file or archive which is non-compressible, will be compressed, which can often actually result in a larger file than the original. For example any lzma or 7z file you have will end up being compressed in zstd, which will often result in a larger file than the original. Compressing a compressed file doesn't result in magic.

          I'm curious your choice of compression level. Default is 3, why go with 1? I keep default at 3 for home, 9 for archive arrays. ZSTD goes from 1 to 15 (if you set at 0, it's 3).
          Because 1 is basically transparent on SSDs. Once you go to 3 and up it starts to be noticeable regardless of the underlying storage medium. Spinning HDDs can get away with 7-11 because it's still faster than their write speeds. For distribution defaults, I'd go with 2 for SSDs and 8 for rotational media and USB flash (at least for the flash drives I own).

          IMHO, this situation is what Zstd-fast:1000 & LZ4 are for...fire & forget transparent compression.

          What I find funny is the kernel's Zstd is something like version 1.3 or 1.3.3 and doesn't have access to Zstd-fast. It's worth mentioning because ZFS uses Zstd 1.4.5 (it brings its own Zstd). Basically, the same data will compress faster and better on ZFS with the exact same Zstd levels in use. Just food for thought.

          Comment


          • #35
            Originally posted by curfew View Post
            Bullshit. The extents are anyway going to be 128 KiB and the same metadata bloat and fragmentation affects all compression-enabled partitions including compressible data, not just the data that is incompressible.
            You are attempting to refute what I never said.

            Originally posted by curfew View Post
            So if you don't view this fragmentation as an issue with compressed data, it hardly is an issue with non-compressed data.
            It is always an issue. However, with compressed data I acknowledge this as a fundamental and unavoidable property of any transparent compression system with random access. With incompressible files, though, there is nothing forcing btrfs to split data into small extents, it is purely an implementation deficiency.

            Originally posted by curfew View Post
            (My feel is that the incompressible data would have to present a majority of the data on the partition in order to be considered problematic.)
            And this is another problem with compress-force — you cannot use it for a subset of files on a btrfs volume. You have to enable that mode globally.

            Comment


            • #36
              Originally posted by intelfx View Post
              Zstd has its own compressibility check within the encoder itself, and another compressibility check on the encoder-filesystem transition. If an extent turns out incompressible, it will be rejected by the encoder and stored uncompressed on the btrfs level (i. e. not "compressed" with ratio 1.0, but as an actual uncompressed extent). Thus, using compress-force=zstd will not, under any circumstances, result in files larger than the original. It will potentially waste cycles trying to compress incompressible data, but that's about it.
              I don't think this pointless compression can be simply shrugged off, especially considering how big of a noise you made about the fragmentation of non-compressible data just a bit later. If the chosen compression level is high, it will be quite slow, and doing pointless compression will equally waste larger amounts of time and CPU cycles.

              So now you have highlighted two considerable issues WRT forced (attempted) compression: 1) high levels of avoidable fragmentation 2) potentially high levels of wasted CPU time.

              The maximum disk space savings seem to be around 40-45 % for optimal compressible data. So that's the level we can achieve with compress-force, but we also have to accept the aforementioned drawbacks. How about the default compressibility heuristics? What is the amount of "unachieved gains" when using simpler heuristics that -- based on your conclucion -- will not incur the two major drawbacks. Just my gut feeling makes me think the little bit gained extra compression isn't worth it with all the downsides that come with it.

              Comment


              • #37
                Originally posted by cynic View Post

                yes, especially on some workloads and on HDD, btrfs is slow, but the extra features, if you need them, make this bearable.

                are your bad experiences on HDD or SDD?
                Both actually. SSD "seemed" (very unscientific statement I would agree) worse than HDD in my case. This is one of the many reasons I really appreciate LInux is that there are options for things like file systems to get your system as you want it depending on your workload.

                Comment


                • #38
                  I would expect something like lz4 just to make sure that compression will never bottleneck SSD performance.
                  Very surprised about choosing zstd. even "1" level is very slow for modern ssds (and even worse for SSDs in raid): https://github.com/lz4/lz4

                  Comment


                  • #39
                    Originally posted by C8292 View Post
                    I would expect something like lz4 just to make sure that compression will never bottleneck SSD performance.
                    Very surprised about choosing zstd. even "1" level is very slow for modern ssds (and even worse for SSDs in raid): https://github.com/lz4/lz4
                    BTRFS doesn't support LZ4. You have to use F2FS or ZFS for that. The cool thing about F2FS is it also supports LZ4-HC.

                    Comment


                    • #40
                      Originally posted by curfew View Post
                      Bullshit. The extents are anyway going to be 128 KiB and the same metadata bloat and fragmentation affects all compression-enabled partitions including compressible data, not just the data that is incompressible. So if you don't view this fragmentation as an issue with compressed data, it hardly is an issue with non-compressed data. (My feel is that the incompressible data would have to present a majority of the data on the partition in order to be considered problematic.)
                      No need to write in bad words.

                      With normal compress mount option, then uncompressed extents can be larger than 128KiB, while with compress-force always limits them to 128KiB or less. If you were to store a nice big 10GiB video file it would create at least 81920 extents with the compress-force option even if the file wasn't compressed.

                      Comment

                      Working...
                      X