Announcement

Collapse
No announcement yet.

Bcachefs Linux File-System Benchmarks vs. Btrfs, EXT4, F2FS, XFS

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by PuckPoltergeist View Post

    Why should this be impossible on NOCOW filesystems?
    To put it in a grossly simplistic way, writing on a XFS or Ext4 filesystem with data checksum would need to work along these lines:

    1. Add journal entry
    2. Write an extent
    3. Calculate and write the checksum
    4. Close journal entry

    Now what happens if the system crashes between steps 2 and 3? Upon reboot, the extent written has no valid checksum, so the FS safety guarantees go out thr window. And we also can't just recalculate it, precisely because in the absence of a valid checksum in the first place, there is no way to tell if the data have been corrupted or not.

    Comment


    • #22
      Originally posted by AndyChow View Post
      By now, it's evident that BTRFS is badly designed. Bcachefs might not have all the features, but they are building them slowly, in a sane way, once the basics have been mastered and work. With BTRFS, everything was thrown together, and it doesn't work. A couple of weeks ago, a pcie hardware failure caused my system to require a hard reboot. My raid-1 btrfs had the last leaf on the most recent tree corrupted. There was absolutely no way to repair it. The only thing I could do was dump the files in another array, and destroy and fe-format the btrfs array. All attempts to recover and all commands were done by a btrfs developer.

      So with BTRFS, a raid-1 array that has the very last block of the very last write broken due to a power failure corrupts the entire filesystem. And there is no way to recover. So how is btrfs even COW? My understanding of COW is that you could just truncate the last modifications and recover everything not too new. But no, it doesn't, not with BTRFS.
      AFAIK RAID1 has been production quality for many years and has been tested in similar scenarios countless times. Your problem has probably some deeper cause that may or may not be related to BTRFS.

      Comment


      • #23
        Would you use XFS for your boot drive?

        Comment


        • #24
          Originally posted by jacob View Post

          I think it's the other way around. ZFS and BTRFS have CoW as one of their main design features, and incidentally it allowed them to support data checksum.
          No it's not the other way around. Both ZFS and BTRFS choose to go with CoW for a reason and checksumming without a write hole was one (from a long list) of those reasons.

          Comment


          • #25
            Originally posted by jacob View Post

            To put it in a grossly simplistic way, writing on a XFS or Ext4 filesystem with data checksum would need to work along these lines:

            1. Add journal entry
            2. Write an extent
            3. Calculate and write the checksum
            4. Close journal entry

            Now what happens if the system crashes between steps 2 and 3? Upon reboot, the extent written has no valid checksum, so the FS safety guarantees go out thr window. And we also can't just recalculate it, precisely because in the absence of a valid checksum in the first place, there is no way to tell if the data have been corrupted or not.
            There is no difference to a COW filesystem. The commit was not done, so you have to discard it.

            Comment


            • #26
              Originally posted by PuckPoltergeist View Post

              There is no difference to a COW filesystem. The commit was not done, so you have to discard it.
              No. In a COW system it goes like this:

              1. write new extent, alongside the old one
              2. write new checksum alongside the old one
              3. write new metadata, with their own checksums etc., alongside old metadata
              4. commit

              Until step 4, the old data remain unchanged. If a crash occurs at any time during that period, the newly written data will be lost but we will use the old data, with a valid checksum.

              The commit itself is atomic and is basically akin to a single update of a field in the superblock. In other words it can't crash mid-way. After the commit, we use the newly written data and a valid checksum is in place.

              Comment


              • #27
                Originally posted by jacob View Post

                No. In a COW system it goes like this:

                1. write new extent, alongside the old one
                2. write new checksum alongside the old one
                3. write new metadata, with their own checksums etc., alongside old metadata
                4. commit

                Until step 4, the old data remain unchanged. If a crash occurs at any time during that period, the newly written data will be lost but we will use the old data, with a valid checksum.

                The commit itself is atomic and is basically akin to a single update of a field in the superblock. In other words it can't crash mid-way. After the commit, we use the newly written data and a valid checksum is in place.
                You're speaking about data consistency. That's something different. In a COW filesystem, this is implicit. But you can achieve this with full data journaling on NOCOW filesystems too. Nevertheless this is not relevant to data integrity, what checksums do. Be careful and don't mix this two parts.

                PS: And to solve the problem, you outlined above, we simply need to change the ordering a little:

                1. Add journal entry with checksum including
                2. Write an extent
                3. write the checksum
                4. Close journal entry

                Now you have the checksum in the log and can verify on log replay.
                Last edited by PuckPoltergeist; 02 June 2018, 05:30 PM.

                Comment


                • #28
                  Originally posted by timofonic View Post

                  Would you like to add XFS to the table? Please...
                  Sure! not much point , but here you go... Added EXT4 as well and differentiated between metadata (the data that describes the filesystem itself) and data (your stored files) checksum. I cleaned up the list a tad as well.

                  NOTE: Some of the features below may be possible by utilizing other tools , this table represents the filesystem native support.
                  Feature Bcachefs Btrfs XFS EXT4
                  Data checksum Yes, but not yet usable Yes No No
                  Metadata checksum Yes, but not yet usable Yes Usable Usable
                  Compression Yes, but not yet usable Yes No No
                  Scrubbing No yet implemented Yes No No
                  Writeback caching Yes Not implemented* No No
                  Replication Not yet implemented Yes No No
                  Encryption Yes, but advised not to use Not implemented* No Yes
                  Snapshots Not yet implemeted Yes No No

                  http://www.dirtcellar.net

                  Comment


                  • #29
                    Originally posted by PuckPoltergeist View Post

                    You're speaking about data consistency.

                    [...]

                    Now you have the checksum in the log and can verify on log replay.
                    I'm speaking about consistency between data and the corresponding checksum. It basically boils down to the fact that after data are written, you need to write the checksum using a separate disk operation. With a NOCOW filesystem there is no way to ensure that these two things will be either executed consistently together, or not at all (remember ACID?).

                    Your solution does not work, because except in the most trivial cases, you don't know all the data in advance to be able to precalculate the checksusm. It also doesn't cater for the more complicated scenarios, like where you partially overwrite an existing extent.

                    Comment


                    • #30
                      Originally posted by jacob View Post

                      I'm speaking about consistency between data and the corresponding checksum It basically boils down to the fact that after data are written, you need to write the checksum using a separate disk operation. With a NOCOW filesystem there is no way to ensure that these two things will be either executed consistently together, or not at all (remember ACID?).
                      This doesn't matter for integrity. It's about consistency. And you will achieve this with explicit full data journaling too. Without this, you may loose data. But this is pretty normal for filesystems without data journaling. It doesn't matter, how the journaling is done (implicit or explicit) and it's totally independent from checksums.

                      edit: to make it more clear, that this is independent from checksums, look at your example without those checksums:

                      1. Add journal entry
                      2. Write an extent
                      3. Close journal entry

                      If you have a crash between 1. and 3. your data is lost. It doesn't matter, if you add any checksums. You can add them for detecting data corruptions, but writeout still suffer the same problem. A different problem, that checksums won't solve.

                      Your solution does not work, because except in the most trivial cases, you don't know all the data in advance to be able to precalculate the checksusm. It also doesn't cater for the more complicated scenarios, like where you partially overwrite an existing extent.
                      It does. The data doesn't change with writeout and the checksum is calculated over the data in RAM. So it doesn't matter, if it is calculated before or after writeout.
                      Last edited by PuckPoltergeist; 02 June 2018, 08:07 PM.

                      Comment

                      Working...
                      X