Announcement

Collapse
No announcement yet.

OpenZFS 2.2.1 Released Due To A Block Cloning Bug Causing Data Corruption

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • JanW
    replied
    Originally posted by woddy View Post
    with Btrfs I have never lost any data, while it turns out that OpenZFS users have lost data ... stop saying that Btrfs is unreliable.​
    Fallacy of composition: Just because you (one member of the population) never lost any data using Btrfs does not mean no one in that population has. You seem to conclude from your own experience with Btrfs that there are fewer data losses with Btrfs than OpenZFS in a given population of users.



    Leave a comment:


  • waxhead
    replied
    Originally posted by curfew View Post
    This statement is nonsensical. It was found because it ate someone's data. If it didn't, well, then it wouldn't even be a real bug because it doesn't affect anyone.

    You should've finished off with "hopefully it was found and fixed before too many people unknowingly updated to the broken version."
    You are incorrect. Updates introduce things that by nature are not as well tested as things that has been used for years without anyone noticing.

    define real bug please.

    Leave a comment:


  • Yoshi
    replied
    Currently there are at least two fundamental issues in OpenZFS, one may also affect the older, stable 2.1.x branch.

    System information Type Version/Name Distribution Name Gentoo Distribution Version (rolling) Kernel Version 6.5.11 Architecture amd64 OpenZFS Version 2.2.0 Reference https://bugs.gentoo.org/917224 ...

    I build and regularly test ZFS from the master branch. A few days go I built and tested the commit specified in the headline of this issue, deploying it to three machines. On two of them (the ones ...


    2.2.1 doesn't fix that.
    Last edited by Yoshi; 23 November 2023, 04:55 AM. Reason: fixed typo

    Leave a comment:


  • curfew
    replied
    Originally posted by waxhead View Post

    Yes it is, very much so. And this comes from an btrfs evangelist. But it has bugs like all software that has updates. The important thing is that it was found....
    ​​
    This statement is nonsensical. It was found because it ate someone's data. If it didn't, well, then it wouldn't even be a real bug because it doesn't affect anyone.

    You should've finished off with "hopefully it was found and fixed before too many people unknowingly updated to the broken version."

    Leave a comment:


  • partcyborg
    replied
    Originally posted by Rallos Zek View Post
    Nothing new! ZFS always had a history of eating data and being unstable.
    when compared to the clownshow that has been btrfs OpenZFS has been rock solid.

    Of the 6 or so times I have tried using btrfs, only one did not end in an unmountable filesystem with btrfs recover getting me about ~40% of the data back. The one that wasn't a total loss wasn't a win either, It was a 4 disk stripe of mirrors, and I just got lucky when btrfs kicked the "right" two drives out leaving the fs mountable.

    Rebuilt the array on OpenZFS and it has been rock solid since

    Leave a comment:


  • CommunityMember
    replied
    Originally posted by skeevy420 View Post
    IMHO, this does highlight that OpenZFS's internal and beta/RC testing might not be as robust as it could or should be.
    I would hope that the lesson of good tests for the CI stream being required for all new features is well learned. However, some developers seemed to suggest in the gh issue that that is too hard to accomplish to be made a hard requirement (i.e. breaking things and losing data may be undesirable, but still an acceptable, way forward). If that position is not reined in by the elders than OpenZFS will lose a valuable claim to robustness and reliability.

    Leave a comment:


  • Developer12
    replied
    Not that surprising that a bug could eventually slip through, especially in an area that has been heavily modified in the latest version.

    Disappointing that it wasn't caught by ztest though. Hopefully they expand the testsuite coverage to catch this bug and any others like it.

    Leave a comment:


  • Chugworth
    replied
    Yeah, I already noticed that block cloning is not quite "ready". I ran into a bug just a few days ago when I copied two small text files from one dataset to another. The file appeared to copy, but when I tried opening the file it caused the system to hang to the point where I had to manually reset it. After rebooting, I tried opening the file again, and that caused the system to hang again. I soon noticed that "zpool status" was reporting a problem with those two files so I deleted them and ran a full drive scrub. Everything seems fine now.

    Leave a comment:


  • muncrief
    replied
    Originally posted by skeevy420 View Post

    You can't set it to disabled. It's irreversible. Enabled means able to be used but inactive. Active means in use. As long as it stays "enabled" you're not using it. I think that's confusing, too.

    What you'll have to do is something like setting "cp --reflink=never" globally to make sure that the default "--reflink=auto" doesn't accidentally use reflinks which will trigger your pool to go into an "active" state. Assuming you ever go into an active state, you can delete the copied-with-reflinks files and it should switch from "active" to "enabled".

    IMHO, disabled, inactive, and active would be better descriptions since enabled can be mistaken to mean active and both inactive and active describe the current state of enable so they both imply a feature to be enabled.
    Thank you for the information skeevy420.

    Leave a comment:


  • skeevy420
    replied
    Originally posted by muncrief View Post
    Does anyone know how to disable block cloning?

    I upgraded to 2.2.1 but block cloning is still enabled, even though none of my pools are using it. The output of "zpool get all | grep block_cloning" shows "feature@block_cloning enabled", not active, and from what I've found that means it's not being used. But I can't find a way to set it to disabled.
    You can't set it to disabled. It's irreversible. Enabled means able to be used but inactive. Active means in use. As long as it stays "enabled" you're not using it. I think that's confusing, too.

    What you'll have to do is something like setting "cp --reflink=never" globally to make sure that the default "--reflink=auto" doesn't accidentally use reflinks which will trigger your pool to go into an "active" state. Assuming you ever go into an active state, you can delete the copied-with-reflinks files and it should switch from "active" to "enabled".

    IMHO, disabled, inactive, and active would be better descriptions since enabled can be mistaken to mean active and both inactive and active describe the current state of enable so they both imply a feature to be enabled.

    Leave a comment:

Working...
X