Announcement

Collapse
No announcement yet.

Bcachefs File-System Plans To Try Again To Land In Linux 6.6

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by flower View Post

    my biggest problem was vm performance though. since i switched to zfs i don't have that problem anymore.
    Yeah, major plus that KVM and its associated support structure can directly use ZFS pools for VM storage. You don't lose much, if any, in the way of access performance that way.

    Comment


    • #22
      Originally posted by Joe2021 View Post
      What I do not get so far is what I, as "Joe Average", can expect from it. I am using BTRFS since some years and I am quite happy with it. I enjoy the features like checksumming, snapshots etc. BTRFS was a huge step forward to me, coming from ext4. So which aspect of Bcachefs is supposed to make me migrate?
      I've been quite happy with btrfs for basic single disk root scenarios which is probably all that "Joe Average" cares about. In the not everybody uses it but still common category, I'm excited about the possibility for an in kernel COW filesystem with solid RAID 5 / 6 / 10 designs. People complain all the time about btrfs RAID 5 / 6 with good reason. But even RAID 10 is wack. The "RAID" is at the chunk level, not block device level. This is not obvious even to people who are very comfortable with traditional RAID setups. A 4 disk btrfs RAID 10 can only survive a single disk failure. When any second disk fails, you basically have a 0% chance of avoiding data loss. In a traditional 4 disk RAID 10 (or ZFS pool of mirrors), you should have a 66% chance of avoiding data loss when any 2nd disk fails. Crap like this is one of the many reasons so many of us use ZFS.
      Last edited by pWe00Iri3e7Z9lHOX2Qx; 12 July 2023, 02:41 PM.

      Comment


      • #23
        I'm curious as to the performance of bcachefs. At home I use btrfs and am happy with it.

        But in benchmarks, ext4 still beats it as far as I know (though I haven't seen any recent file system benchmarks here on Phoronix, at least not ones covering my use case of compiling large C/C++ code bases).

        Comment


        • #24
          Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post

          I've been quite happy with btrfs for basic single disk root scenarios which is probably all that "Joe Average" cares about. In the not everybody uses it but still common category, I'm excited about the possibility for an in kernel COW filesystem with solid RAID 5 / 6 / 10 designs. People complain all the time about btrfs RAID 5 / 6 with good reason. But even RAID 10 is wack. The "RAID" is at the chunk level, not block device level. This is not obvious even to people who are very comfortable with traditional RAID setups. A 4 disk btrfs RAID 10 can only survive a single disk failure. When any second disk fails, you basically have a 0% chance of avoiding data loss. In a traditional 4 disk RAID 10 (or ZFS pool of mirrors), you should have a 66% chance of avoiding data loss when any 2nd disk fails. Crap like this is one of the many reasons so many of us use ZFS.
          I think it's a common misconception that RAID is a function of filesystems. Filesystems sit on top of any volume management which may or may not implement a RAID standard whether volume management is implemented in software or hardware/firmware. Theoretically, there's no technical reason you can't have exFAT on top of a 12 disk RAID 5 array, although I can think of many reasons why you wouldn't from a purely data integrity point independent of RAID fail-over. No RAID system by itself will save you from corrupted data being written to the volume.

          Adding for clarity: In the ZFS case, the filesystem and volume management are part of a single integrated package. However, the basic architecture remains the same. ZFS has a file system that sits on top of the volume management layer.

          Personal note: I think the problem with BTRfs is that the only features that get enough attention to be stable and performant are the use cases the maintainers (mainly Facebook & Oracle?) utilize. For everyone else, we have to use ZFS which has a lot of big companies working on it so there's a more diverse user and developer base.
          Last edited by stormcrow; 12 July 2023, 04:13 PM.

          Comment


          • #25
            Originally posted by stormcrow View Post

            I think it's a common misconception that RAID is a function of filesystems. Filesystems sit on top of any volume management which may or may not implement a RAID standard whether volume management is implemented in software or hardware/firmware. Theoretically, there's no technical reason you can't have exFAT on top of a 12 disk RAID 5 array, although I can think of many reasons why you wouldn't from a purely data integrity point independent of RAID fail-over. No RAID system by itself will save you from corrupted data being written to the volume.

            Adding for clarity: In the ZFS case, the filesystem and volume management are part of a single integrated package. However, the basic architecture remains the same. ZFS has a file system that sits on top of the volume management layer.

            Personal note: I think the problem with BTRfs is that the only features that get enough attention to be stable and performant are the use cases the maintainers (mainly Facebook?) utilize. For everyone else, we have to use ZFS.
            I think the biggest problem with the RAID10 example I gave is that nobody who is familiar with RAID10 from other systems would expect a write pattern like this to be possible.

            Code:
            | SDA | SDB | SDC | SDD |
            |-----|-----|-----|-----|
            | A1 | A2 | A1 | A2 |
            | B1 | B1 | B2 | B2 |
            | C1 | D1 | D1 | C1 |
            | D2 | C2 | C2 | D2 |​
            I think for btrfs they should have actually named these profiles something else, because a lot of assumptions get made based on a name and previous familiarity / experience. I certainly wouldn't instinctively assume that writes were like "mini RAID10s" going everywhere willy-nilly and that I was totally screwed if any second disk fails. But yes, agreed on the volume management. ZFS is wonderfully simple to set up and manage compared to the unholy combined hell of layers like dm-crypt + dm-integrity + dm-raid etc.

            Comment


            • #26
              Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
              I think for btrfs they should have actually named these profiles something else, because a lot of assumptions get made based on a name and previous familiarity / experience. I certainly wouldn't instinctively assume that writes were like "mini RAID10s" going everywhere willy-nilly and that I was totally screwed if any second disk fails. But yes, agreed on the volume management. ZFS is wonderfully simple to set up and manage compared to the unholy combined hell of layers like dm-crypt + dm-integrity + dm-raid etc.
              I agree that the RAID teminology being used in BTRFS is not very smart and lots of people who complain about BTRFS does not get this. It was some work being done a while ago for suggesting a different naming scheme, altough one might argue that using the RAID name draws people to it because of familiarity and they "instantly know" what it is all about.

              And no , you are not totally screwed if one disk fails depending on how you configure metadata. The failure mode may be perfectly acceptable and besides if one disk fail it does not have to be as taxing for the drives to duplicate remaining replica of the lost drive's data to other drives. E.g. with one drive lost , you *MAY* have a faster route to recovering the array if you have existing space than on a traditional raid10.

              Anyway , I am biased towards BTRFS so I am going to stick with what I know works if things go wrong.

              Regarding BcacheFS which this tread is really about , I am actually having my doubts that this will get merged for 6.6. There are a few familiar names in the email thread pointing out some issues with Kent Overstreet amongst them Theodore Ts'O .... The mailing list thread is actually rather interesting with other names such as Dave Chinner chiming in too , worth a read...


              http://www.dirtcellar.net

              Comment


              • #27
                Originally posted by Nelson View Post
                This is something else here: https://evilpiepirate.org/git/bcache...fs/bkey.c#n727

                Not sure it should get merged
                he claims it gives a 5% performance boost. There's been quite a lot of discussion about that bit of code on the mailing lists, and of course it is able to be config'ed out.

                Comment


                • #28
                  Originally posted by fitzie View Post

                  he claims it gives a 5% performance boost. There's been quite a lot of discussion about that bit of code on the mailing lists, and of course it is able to be config'ed out.
                  I bet that's in specific conditions. I/O is still magnitudes slower than processing. I wouldn't be at all surprised if it was less than 5% in a lot of circumstances. At the cost of portability and readability.

                  At the least, he should hack ARM64 in there too, but I'd scuttle that entirely until the filesystem is solid.

                  Comment


                  • #29
                    Originally posted by kreijack View Post
                    For curiosity, could you describe better you stack ? Because btrfs over dm-raid/md-raid loses the capability to rebuild bad data from the good copy...
                    It's actually BTRFS | LVM | DM-Crypt | BCache | DM-RAID, where DM-Crypt is managed by Cryptsetup and DM-RAID is managed by LVM. The top-most LVM Layer is split into different filesystems for different purposes.

                    Although I'm not using it, LVM actually has the option to layer DM-Integrity over each RAID member for per-member corruption detection. Because DM-Integrity treats corruption as read errors, the other RAID members are automatically used if the data on one member is corrupt. The RAID layout is a 6x4TB “raid6_ls_6”, which is a non-standard combination of left-symmetric RAID5 (distributed parity) but the last disk is dedicated to Q syndrome parity. This has the benefit that I can switch between RAID5 and "RAID6" without reshaping, at the expense of losing 1/6 disks worth of read performance. In theory RAID6 should also be able to tell which member is invalid in the case of a mismatch (without per-member DM-Integrity), but DM-RAID/LVM doesn't currently have that feature.

                    BCache is used in write-through mode, so the SSD can fail without data loss. My boot partition is a RAID1 at the beginning of all RAID members (thanks to Grub) so truly any 2 drives could fail without losing any data. I use the integrity checking of BTRFS as a sanity check of the RAID, BCache, and the SSD. It also functions as a janky method of "authenticated encryption". Besides the BTRFS RAID56 issues, at-rest encryption is important to me. So until BTRFS supports encryption, I'd need to encrypt all RAID members individually.​

                    I honestly prefer having separate layers that I can manage myself. I can (and eventually will) switch BCache to DM-Cache. And move integrity checking from BTRFS to DM-Crypt for AEAD. A while ago I switched from MDAdm to DM-RAID. I couldn't mix and match implementations with an all-in-one solution. I also probably couldn't tweak as many settings.
                    Last edited by EphemeralEft; 12 July 2023, 06:31 PM.

                    Comment


                    • #30
                      Originally posted by waxhead View Post

                      Anyway , I am biased towards BTRFS so I am going to stick with what I know works if things go wrong.
                      Arguably if things went wrong then it doesn't work.

                      Regardless, RAID 10 is misnamed to begin with. It's often (but not always) a strange portmanteau of RAID 0 & 1 and not a standard itself. It's unwise to make assumptions about its implementation details. Anything, including and especially BTRfs' version, being labeled as "RAID 10" should have its implementation details investigated before utilization because assumptions can be wrong.

                      Comment

                      Working...
                      X