Announcement

Collapse
No announcement yet.

OpenZFS Is Still Battling A Data Corruption Issue

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    The fix will have obvious performance implications, so it will be interesting to see the next round of benchmarks.

    Comment


    • #12
      Originally posted by CommunityMember View Post

      Tested backups, with a process to restore the data and to validate. If you don't test restoring and validating your backup you do not actually have a backup. It is unfortunately true that there are those that learn that lesson the hard way.
      So true.

      Well, at least it happened 3+ decades ago, and I didn't lose everything, just some important (to me) C code I had worked hard to develop. Luckily, at that time we, developers, used to write down the important aspects of whatever on paper, and, as so, it was a matter of painful code rewrite time to put almost everything back, almost.

      Hard lessons leave a persistent imprint.​

      Comment


      • #13
        Originally posted by CommunityMember View Post

        Tested backups, with a process to restore the data and to validate. If you don't test restoring and validating your backup you do not actually have a backup. It is unfortunately true that there are those that learn that lesson the hard way.
        Yeah, I'm sure the Gitlab db restore is still somewhere on YouTube

        Comment


        • #14
          Some people say that OpenZFS is a reliable and resilient file system that protects data from corruption.

          However, recent reports show that OpenZFS 2.2 has a serious bug that causes data loss when copying files.

          This is unacceptable for a file system that claims to have advanced features like checksums, snapshots, and clones.

          How can users trust OpenZFS when it fails to perform basic operations without damaging data? OpenZFS should fix this bug as soon as possible and restore its reputation as a robust and secure file system.

          OpenZFS is not part of Linux and does not work well with it. Avoid OpenZFS and use something better.​

          Comment


          • #15
            The overriding, and surprising, issue for me after the last week is that no one developing ZFS appears to know how it works. And this is the exact same problem I observed with the BTRFS developers.

            I just switched to ZFS a few months ago because after browsing through the BTRFS developer bug threads I was shocked at their haphazard approach to development and bug detection and correction. Everyone was just guessing, and no one was referring to overall architectural documents, flow charts, or any of the common organizational project structures that were mandatory in my day.

            But it now appears that ZFS suffers the exact same disorganization. Everyone, literally, is just guessing. And then running scripts to evaluate the odds that things are actually working correctly. Just like BTRFS.

            Good god people, this is not the way something as complex as a file system should be developed. There need to be lead architects that define the operation of the system, document it clearly and concisely, and create verification systems using both coverage and fuzz testing. And as new changes and/or features are introduced the verification systems must be updated to accommodate them.

            But I don't know, as I've said many times before I'm just a retired old-time hardware/firmware/software developer. Is this really the way projects are organized today? Essentially just a disorganized mess where everyone throws in their two cents and then create arbitrary scripts to "prove" they're correct?

            If so then we're in a world of hurt, because entropy is eventually going to demand that things break down.

            As for what I'm going to do now personally I just don't know. I have around 12.5 TB of data, with the majority 11 TB residing on my media server. And though I have local and cloud backups going back years, the problem with ext4 was that I couldn't detect bit rot. And by the time I discovered a bad file I would have no idea when it became corrupted. My hope was that with a monthly ZFS scrub I could detect bit rot and restore good files from my backups, rather than buying two or three times the amount of storage I need and creating complex RAID systems that had also failed me so many times over the decades.

            I must say, it's really quite disheartening.

            As it appears that, at least for now, there's simply no way to reliably detect bit rot and other data integrity issues and be assured they can be remedied.

            Comment


            • #16
              Originally posted by AlanTuring69 View Post

              It was difficult to set up, but I ensure my backups are stored in mediums at every known vacuum state. The chemistry gets a bit weird but it's still not clear which is the true vacuum state so it's better to be safe.
              *True* vacuum state?!?! There is no such thing! ... It's turtles all the way down

              Comment


              • #17
                Originally posted by muncrief View Post
                I just switched to ZFS a few months ago because after browsing through the BTRFS developer bug threads I was shocked at their haphazard approach to development and bug detection and correction. Everyone was just guessing, and no one was referring to overall architectural documents, flow charts, or any of the common organizational project structures that were mandatory in my day.

                But it now appears that ZFS suffers the exact same disorganization. Everyone, literally, is just guessing. And then running scripts to evaluate the odds that things are actually working correctly. Just like BTRFS.
                Unfortunately this is the approach of many open source projects.
                It works well in many cases but it is disastrous for complex filesystems.

                Comment


                • #18
                  Originally posted by muncrief View Post
                  The overriding, and surprising, issue for me after the last week is that no one developing ZFS appears to know how it works. And this is the exact same problem I observed with the BTRFS developers.

                  I just switched to ZFS a few months ago because after browsing through the BTRFS developer bug threads I was shocked at their haphazard approach to development and bug detection and correction. Everyone was just guessing, and no one was referring to overall architectural documents, flow charts, or any of the common organizational project structures that were mandatory in my day.

                  But it now appears that ZFS suffers the exact same disorganization. Everyone, literally, is just guessing. And then running scripts to evaluate the odds that things are actually working correctly. Just like BTRFS.

                  Good god people, this is not the way something as complex as a file system should be developed. There need to be lead architects that define the operation of the system, document it clearly and concisely, and create verification systems using both coverage and fuzz testing. And as new changes and/or features are introduced the verification systems must be updated to accommodate them.

                  But I don't know, as I've said many times before I'm just a retired old-time hardware/firmware/software developer. Is this really the way projects are organized today? Essentially just a disorganized mess where everyone throws in their two cents and then create arbitrary scripts to "prove" they're correct?

                  If so then we're in a world of hurt, because entropy is eventually going to demand that things break down.

                  As for what I'm going to do now personally I just don't know. I have around 12.5 TB of data, with the majority 11 TB residing on my media server. And though I have local and cloud backups going back years, the problem with ext4 was that I couldn't detect bit rot. And by the time I discovered a bad file I would have no idea when it became corrupted. My hope was that with a monthly ZFS scrub I could detect bit rot and restore good files from my backups, rather than buying two or three times the amount of storage I need and creating complex RAID systems that had also failed me so many times over the decades.

                  I must say, it's really quite disheartening.

                  As it appears that, at least for now, there's simply no way to reliably detect bit rot and other data integrity issues and be assured they can be remedied.
                  I really hope Bcachefs follows a better approach, sincerely...

                  Comment


                  • #19
                    Originally posted by muncrief View Post
                    Is this really the way projects are organized today? Essentially just a disorganized mess where everyone throws in their two cents and then create arbitrary scripts to "prove" they're correct?
                    Short answer: yes.

                    Long answer:
                    in the commercial world, time to market is everything. This means reducing anything which creates overhead. Planning and documenting create overhead, as they impede velocity. They don't fly well with managers.
                    In the open source world, it looks like developer burn-out is increasing year over year (at least in my sphere), leading to an "only scratching your own itch is enough" mentality. And it is never anyone's itch to document and architect things properly. Or to spend sleepless night trying to reproduce heisenbugs. Works-on-my-computer is king.
                    In both environments, "agile" methodologies have been widely misused and misunderstood, leading developers to shun documentation, architecture and planning in favour of TDD and build-incrementally, refactor-frequently approaches - of course no-one refactors often in reality. If you have a good test coverage, what use is there for (always-out-of-sync) documentation?
                    Last but not least, no electronics product is designed to last more than a couple of years, so why would software ones? Funnily enough, we are all running decades-old software now...

                    Now, get off my lawn!

                    Comment


                    • #20
                      Originally posted by timofonic View Post
                      Some people say that OpenZFS is a reliable and resilient file system that protects data from corruption.

                      However, recent reports show that OpenZFS 2.2 has a serious bug that causes data loss when copying files.

                      This is unacceptable for a file system that claims to have advanced features like checksums, snapshots, and clones.

                      How can users trust OpenZFS when it fails to perform basic operations without damaging data? OpenZFS should fix this bug as soon as possible and restore its reputation as a robust and secure file system.

                      OpenZFS is not part of Linux and does not work well with it. Avoid OpenZFS and use something better.​
                      Lol. No, I think the lesson here is don't jump on the latest and greatest release branch until it's been well vetted for anything enterprise and/or make sure you have backups. There is a reason RHEL isn't running the latest release kernel. Bugs happen. Heck, it took BTRFS years to figure out their RAID5/6 issues and the recommendation was to only use BTRFS in a RAID mirror.

                      Comment

                      Working...
                      X