Announcement

Collapse
No announcement yet.

OpenZFS 2.2.1 Released Due To A Block Cloning Bug Causing Data Corruption

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Rallos Zek View Post
    Nothing new! ZFS always had a history of eating data and being unstable.
    lol. I think you misspelled BTRFS.

    Comment


    • #12
      Originally posted by fong38 View Post

      Should already be disabled across the board if I understood correctly. Does cp --reflink=always work?
      Thank you for your response fong38. I apologize for taking so long to get back to you but I didn't get the usual email telling me someone had responded to my post.

      In any case I've never used --reflink for anything. I just use regular cp. I just switched to ZFS about 4 months ago because I wanted to use it's bit rot detection, and so far it's worked incredibly well. But other than that I haven't explored it much. If you have some suggestions on how to check for corruption I'd appreciate it, because while my media server hasn't used block cloning, I've since discovered that two of the pools on my desktop have.

      Of course I have both local and cloud backups for everything on both computers, but at this point I have no idea how to check for corruption (other than scrub which apparently won't detect this issue), how to fix it, or how to assure that block cloning is disabled. All I know how to do for this problem right now is use "zpool get all".

      And by the way, I still plan on using ZFS as it has other features like snapshots that I want to use, and despite this bug its data integrity track record still appears better than other advanced file systems. Heck, I've even been bitten by plain old ext4 bugs in the past.

      In fact what I've learned is that when new features are introduced I should wait awhile before upgrading my pools. And since I'm a retired hardware/firmware/software engineer I really don't have any excuse for not realizing that. So I consider this predicament to be partly my fault.
      Last edited by muncrief; 22 November 2023, 11:13 PM.

      Comment


      • #13
        One of the reasons why i stick with the tried and true ext4 on all Linux installs I do. Played around with XFS, ReiserFS​, BTRFS, and use exFAT when i need to be able to share files with a Windows install, but for pure Linux it's ext4 all the way.

        The only time i ever had a problem was LUKS over ext4, but then again I have come to despise full disk encryption and recommend to everyone not to use it, regardless of whether it's BitLocker, True/Veracrypt, LUKS, or whatever.

        Comment


        • #14
          Originally posted by muncrief View Post
          Does anyone know how to disable block cloning?

          I upgraded to 2.2.1 but block cloning is still enabled, even though none of my pools are using it. The output of "zpool get all | grep block_cloning" shows "feature@block_cloning enabled", not active, and from what I've found that means it's not being used. But I can't find a way to set it to disabled.
          You can't set it to disabled. It's irreversible. Enabled means able to be used but inactive. Active means in use. As long as it stays "enabled" you're not using it. I think that's confusing, too.

          What you'll have to do is something like setting "cp --reflink=never" globally to make sure that the default "--reflink=auto" doesn't accidentally use reflinks which will trigger your pool to go into an "active" state. Assuming you ever go into an active state, you can delete the copied-with-reflinks files and it should switch from "active" to "enabled".

          IMHO, disabled, inactive, and active would be better descriptions since enabled can be mistaken to mean active and both inactive and active describe the current state of enable so they both imply a feature to be enabled.

          Comment


          • #15
            Originally posted by skeevy420 View Post

            You can't set it to disabled. It's irreversible. Enabled means able to be used but inactive. Active means in use. As long as it stays "enabled" you're not using it. I think that's confusing, too.

            What you'll have to do is something like setting "cp --reflink=never" globally to make sure that the default "--reflink=auto" doesn't accidentally use reflinks which will trigger your pool to go into an "active" state. Assuming you ever go into an active state, you can delete the copied-with-reflinks files and it should switch from "active" to "enabled".

            IMHO, disabled, inactive, and active would be better descriptions since enabled can be mistaken to mean active and both inactive and active describe the current state of enable so they both imply a feature to be enabled.
            Thank you for the information skeevy420.

            Comment


            • #16
              Yeah, I already noticed that block cloning is not quite "ready". I ran into a bug just a few days ago when I copied two small text files from one dataset to another. The file appeared to copy, but when I tried opening the file it caused the system to hang to the point where I had to manually reset it. After rebooting, I tried opening the file again, and that caused the system to hang again. I soon noticed that "zpool status" was reporting a problem with those two files so I deleted them and ran a full drive scrub. Everything seems fine now.

              Comment


              • #17
                Not that surprising that a bug could eventually slip through, especially in an area that has been heavily modified in the latest version.

                Disappointing that it wasn't caught by ztest though. Hopefully they expand the testsuite coverage to catch this bug and any others like it.

                Comment


                • #18
                  Originally posted by skeevy420 View Post
                  IMHO, this does highlight that OpenZFS's internal and beta/RC testing might not be as robust as it could or should be.
                  I would hope that the lesson of good tests for the CI stream being required for all new features is well learned. However, some developers seemed to suggest in the gh issue that that is too hard to accomplish to be made a hard requirement (i.e. breaking things and losing data may be undesirable, but still an acceptable, way forward). If that position is not reined in by the elders than OpenZFS will lose a valuable claim to robustness and reliability.

                  Comment


                  • #19
                    Originally posted by Rallos Zek View Post
                    Nothing new! ZFS always had a history of eating data and being unstable.
                    when compared to the clownshow that has been btrfs OpenZFS has been rock solid.

                    Of the 6 or so times I have tried using btrfs, only one did not end in an unmountable filesystem with btrfs recover getting me about ~40% of the data back. The one that wasn't a total loss wasn't a win either, It was a 4 disk stripe of mirrors, and I just got lucky when btrfs kicked the "right" two drives out leaving the fs mountable.

                    Rebuilt the array on OpenZFS and it has been rock solid since

                    Comment


                    • #20
                      Originally posted by waxhead View Post

                      Yes it is, very much so. And this comes from an btrfs evangelist. But it has bugs like all software that has updates. The important thing is that it was found....
                      ​​
                      This statement is nonsensical. It was found because it ate someone's data. If it didn't, well, then it wouldn't even be a real bug because it doesn't affect anyone.

                      You should've finished off with "hopefully it was found and fixed before too many people unknowingly updated to the broken version."

                      Comment

                      Working...
                      X