Announcement

Collapse
No announcement yet.

OpenZFS 2.2.2 & OpenZFS 2.1.14 Released To Fix Data Corruption Issue

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by Developer12 View Post

    Have you even looked at ANY of the papers on ZFS' architecture? Here's the first one, to get you started:



    Another example: One of the lead architects of ZFS is Matthew Ahrens, as are people who signed off on the raidz expansion work last month.

    Just because you're totally ignorant of all the things you mention doesn't mean they don't exist.
    I did. And was impressed at first. I just thought there would be more than just vague pictures.

    Comment


    • #42
      Originally posted by Developer12 View Post

      The bug doesn't occur when using older versions of corutils that rely on functions ZFS doesn't support.
      That explains everything! Next time before updating my system I'll make sure to check the changelog of every package and cross reference it against ZFS compatability list to make sure Unhandled Exceptions don't corrupt my data. You're clever, thanks!

      Comment


      • #43
        Originally posted by User29 View Post

        Ok, but if it was only triggered by the coreutils change, why were FBSD people panicking? BSDs don't use GNU stuff.

        They should have just watched the show, munching popcorn.
        "BSD people" aren't a monolith. Plenty of places using BSD's still have the GNU utilities installed because people are much more familiar with Linux. The cp command in coreutils only added a copy optimization and using the API as it is supposed to and certainly not at fault. ZFS just has a data corruption/potential security bug because of incorrect assumptions they made in their code and anything else trying to optimize copy could have triggered the same bug.

        The mission of the CVE® Program is to identify, define, and catalog publicly disclosed cybersecurity vulnerabilities.


        Motivation and Context Closes #15526. Description Over its history this the dirty dnode test has been changed between checking for a dnodes being on os_dirty_dnodes (dn_dirty_link) and dn_dirty_rec...


        "Over its history this the dirty dnode test has been changed between checking for a dnodes being on os_dirty_dnodes (dn_dirty_link) and dn_dirty_record... It turns out both are actually required."
        ​

        Comment


        • #44
          Originally posted by Kjell View Post
          We need a ELI5
          True. Something like this is in the pipeline. I had a sneak preview.
          ​
          Originally posted by User29 View Post
          … why were FBSD people panicking? …
          ​​I saw unnecessary alarm, nothing like panic, in only one area, an area that was only partly FreeBSD-related. The alarmists' refusal to describe their use cases suggested, to me, that they were not users of ZFS.

          The report in Bugzilla for FreeBSD was calm.

          Let's note that whilst CVE-2023-49298 referred to the report, the outcome of the report was an errata notice (distinctly not a security advisory).

          Comment


          • #45
            Originally posted by Rallos Zek View Post

            ZFS fails again!

            ZFS has been proven pretty flakey and unreliable for many years.
            In the strictest sense, all production filesystems are flakey since they are not formally verified.

            However, there is an incredible amount of use of ZFS and the vast majority of users find it to be rock solid. The fact that this is news underscores this. If this had happened with a filesystem in the kernel source tree, it would not make the news like this did, since they have far more problems.

            Comment


            • #46
              Originally posted by Siuoq View Post
              XFS or ext4 are probably good tho.

              What if a filesystem were as simple as possible, and all the funny stuff were handled by a layer above it?
              Have a block device randomly corrupt the wrong thing and those can blow up.

              As for handling things in a layer above the disk filesystem, that is the way that distributed storage often works. That is expensive unfortunately. It is not immune to data corruption bugs either. For example:



              The underlying problem is that we are unable to write perfect software for managing data storage. This affects all production filesystems. :/
              Last edited by ryao; 14 December 2023, 01:19 PM.

              Comment


              • #47
                Originally posted by smitty3268 View Post

                It seems like it was a pretty clear problem in SEEK_HOLE/SEEK_DATA that could have easily been unit tested to see there was a problem.

                If your argument is that nothing used this codepath, and it wasn't feasible to unit test, then it should have just been deleted - since if nothing uses it, there's no reason to keep untested code around.

                That said, I get it - bugs happen. But your comment here is pretty obnoxious fanboy-sounding nonsense, so I had to reply.
                There were unit tests for SEEK_HOLE/SEEK_DATA in the ZFS Test Suite, but unit testing is not a foolproof way of catching bugs. Also, they predated the introduction of the block cloning feature. We could really use more unit tests aimed at block cloning, but at the same time, there are many features in a modern storage stack and developing unit tests for all possible interactions is incredibly difficult if not intractable. There is the ztest tool intended to help cover this blind spot, but it does not catch everything much like how fuzz testing does not catch everything. :/
                Last edited by ryao; 14 December 2023, 01:24 PM.

                Comment


                • #48
                  Originally posted by AlanTuring69 View Post

                  I think it was a lot more complicated than that, otherwise they wouldn't have had to rely on some bespoke integration test to replicate it and prove it was fixed. It is nonetheless true that very solid + robust unit tests theoretically would have caught it, but it's also true that it's a kernel module which is not simple to unit test.

                  But because OpenZFS isn't named bcachefs or btrfs it's suddenly the most buggy filesystem ever to exist.
                  ZFS does have a test suite that does unit tests:

                  OpenZFS on Linux and FreeBSD. Contribute to openzfs/zfs development by creating an account on GitHub.


                  It even had tests for SEEK_HOLE/SEEK_DATA. It did not have a test for their interaction with block cloning. It is hard (intractable?) to make unit tests for all possible scenarios in advance, although it definitely could use more tests focused on block cloning. There is a stochastic testing tool called ztest meant to help cover the blind spot formed by the inability to make unit tests for every possible combination of events, but it cannot find all bugs either. :/

                  Note that these tests are run on every commit to the repository, plus every pull request, and every revision to every pull request.
                  Last edited by ryao; 14 December 2023, 01:36 PM.

                  Comment


                  • #49
                    Originally posted by IntrusionCM View Post
                    Given the complexity of ZFS and the complexity of filesystems in general, especially with features like sparse files, compression, asynchronity and so on... There are probably a few more silent killer bugs hidden. Not only in ZFS, but in all filesystems.
                    I agree. A number of people, myself included, have made various attempts over the years to find such bugs in ZFS, but finding even one is incredibly hard. Last year, I succeeded in finding a potential corruption bug:

                    Coverty static analysis found these. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Neal Gompa <[email protected]...


                    I had been reading through static analysis reports in the belief that serious bugs could be found through them and got lucky. In-memory B Trees are used internally for various things, including some things that are written to disk. A few bug reports were made by people who had assertions trip that prevented this bug from doing anything really bad, but it was such a hard to hit issue that it is unclear if it ever caused any actual on-disk corruption.

                    Comment


                    • #50
                      Originally posted by Kjell View Post

                      That explains everything! Next time before updating my system I'll make sure to check the changelog of every package and cross reference it against ZFS compatability list to make sure Unhandled Exceptions don't corrupt my data. You're clever, thanks!
                      there are no unhandled exceptions. if ZFS doesn't support those fancy syscalls then the utils fall back to other syscalls instead. most filesystems don't support them either.

                      Comment

                      Working...
                      X