Bcachefs Lands More Bug Fixes In Linux 6.14

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • varikonniemi
    Senior Member
    • Jan 2012
    • 1102

    #21
    Originally posted by Developer12 View Post

    Give it time. Right now there are essentially zero users of bcachefs compared to any other filesystem in the kernel. Given this filesystem's complexity, I'm sure there will be plenty of data eating bugs tripped over during the next 2-3 years.
    bcachefs has been used in production before it was merged into the kernel, and no reports of data eating even back then. It is used in production currently, no such reports.

    I think the second wave of desktop users that come when it's "stable" don't stress it more than the production/enthusiasts users over the past years.

    Comment

    • Developer12
      Senior Member
      • Dec 2019
      • 1584

      #22
      Originally posted by varikonniemi View Post

      bcachefs has been used in production before it was merged into the kernel, and no reports of data eating even back then. It is used in production currently, no such reports.

      I think the second wave of desktop users that come when it's "stable" don't stress it more than the production/enthusiasts users over the past years.
      "used in production" at worst could mean "one guy at a company of 5 people ran it for some non-critical data"

      Right now there are not even 1/10,000th as many people using bcachefs compared to more popular filesystems like any of ext4, XFS, ZFS, BTRFS, or even the lowly F2FS. The fact of the matter is that those "enterprise" users who supposedly exist are effectively zero compared to the number of users that every other filesystem has. Even reiserFS has seen more use than bcahcefs at this point, given all the years it's variants spent in the kernel.

      Comment

      • varikonniemi
        Senior Member
        • Jan 2012
        • 1102

        #23
        Originally posted by Developer12 View Post

        "used in production" at worst could mean "one guy at a company of 5 people ran it for some non-critical data"

        Right now there are not even 1/10,000th as many people using bcachefs compared to more popular filesystems like any of ext4, XFS, ZFS, BTRFS, or even the lowly F2FS. The fact of the matter is that those "enterprise" users who supposedly exist are effectively zero compared to the number of users that every other filesystem has. Even reiserFS has seen more use than bcahcefs at this point, given all the years it's variants spent in the kernel.
        That is pretty fair points, but it's not the whole truth. Kernel filesystem testing has gotten so much better in past year that testing coverage is probably better than a million monkeys using it normally for a decade.

        The problems don't pop up when you increase users with the same usage pattern. It tends to be the most rare combinations that cause the last problems that escape into production. And for that you need propellerheads that do explicit testing like injecting corruption (kernel CI just started doing even this lately), not just more mainstream users.
        Last edited by varikonniemi; 03 February 2025, 06:34 AM.

        Comment

        • lyamc
          Senior Member
          • Jun 2020
          • 526

          #24
          Originally posted by Developer12 View Post

          "used in production" at worst could mean "one guy at a company of 5 people ran it for some non-critical data"
          It turns out that people who get the most out of bcachefs cache tiered storage.... will have large amounts of storage. Here's mine:

          Code:
          [lyam@nixos:~]$ sudo bcachefs fs usage -h /media/bcfs
          Filesystem: 90e665fc-92a4-4209-a181-85863cb87ccb
          Size: 50.2 TiB
          Used: 42.7 TiB

          Comment

          • Developer12
            Senior Member
            • Dec 2019
            • 1584

            #25
            Originally posted by varikonniemi View Post

            That is pretty fair points, but it's not the whole truth. Kernel filesystem testing has gotten so much better in past year that testing coverage is probably better than a million monkeys using it normally for a decade.

            The problems don't pop up when you increase users with the same usage pattern. It tends to be the most rare combinations that cause the last problems that escape into production. And for that you need propellerheads that do explicit testing like injecting corruption (kernel CI just started doing even this lately), not just more mainstream users.
            Tests are synthetic, and as a result only cover a small number of possible cases (whatever a given dev could think of). Often there is only one test per case. That's nothing compared to thousands of real users with 24/7 usage, many of whom try to do very silly things.

            Comment

            • Developer12
              Senior Member
              • Dec 2019
              • 1584

              #26
              Originally posted by lyamc View Post

              It turns out that people who get the most out of bcachefs cache tiered storage.... will have large amounts of storage. Here's mine:

              Code:
              [lyam@nixos:~]$ sudo bcachefs fs usage -h /media/bcfs
              Filesystem: 90e665fc-92a4-4209-a181-85863cb87ccb
              Size: 50.2 TiB
              Used: 42.7 TiB
              Amount of data is meaningless for finding bugs. You don't find more bugs with a 1GB vs 10KB file, all other things being equal. To find bugs you need variation in usage, which you only get by having *more users.*

              Comment

              • varikonniemi
                Senior Member
                • Jan 2012
                • 1102

                #27
                Originally posted by Developer12 View Post

                Tests are synthetic, and as a result only cover a small number of possible cases (whatever a given dev could think of). Often there is only one test per case. That's nothing compared to thousands of real users with 24/7 usage, many of whom try to do very silly things.
                Injection of random corruption and expecting runtime fsck to fix it will achieve astounding coverage in very little time completely automatedly and is a real world test of what happens when physical loss of data happens. They even have tests that inject the corruption in metadata areas only, which can achieve coverage billions of times faster than any real world testing when time is not spent doing repair of the most probable corruption location, the actual data. (why continue testin it once it is deemed completely stable?)

                Just imagine the speed of real-world results you get if you randomly inject corruption into a data area, and then try to access said data, and then either replicate it from raid or just mark it corrupted and move on. A fast drive can do this kind of loop a hundred times per second. Now multiply it by the average error rate of a storage media, and then calculate how much real world in the wild usage you would need to achieve same coverage.
                Last edited by varikonniemi; 05 February 2025, 11:16 AM.

                Comment

                • lyamc
                  Senior Member
                  • Jun 2020
                  • 526

                  #28
                  Originally posted by Developer12 View Post

                  Amount of data is meaningless for finding bugs. You don't find more bugs with a 1GB vs 10KB file, all other things being equal. To find bugs you need variation in usage, which you only get by having *more users.*
                  If all you have are 1GB or larger files then it's pointless to use bcachefs over something else unless you need the performance from caching. It makes even more sense to use bcachefs as the sizes get smaller.

                  Data without users is pointless in the first place. Lots of data implies lots of users or at least a lot of access from servers/applications (clients). If that wasn't the case they wouldn't need bcachefs. They could use something else more established.

                  It's like having someone release a HPC library, and you going on about how it probably doesn't scale and that no one serious uses it. It's actually the other way around. It's designed to scale and the unique features means that it's mostly people serious about it that are using it.

                  Every argument you've made is completely void of basic reason, so I have a tough time believing that you're a serious person.

                  Comment

                  • varikonniemi
                    Senior Member
                    • Jan 2012
                    • 1102

                    #29
                    Originally posted by Developer12 View Post

                    Tests are synthetic, and as a result only cover a small number of possible cases (whatever a given dev could think of). Often there is only one test per case. That's nothing compared to thousands of real users with 24/7 usage, many of whom try to do very silly things.
                    Modern tests are approaching the level of "give me all possible actions, and i will test all possible combinations."

                    Comment

                    • Developer12
                      Senior Member
                      • Dec 2019
                      • 1584

                      #30
                      Originally posted by varikonniemi View Post

                      Modern tests are approaching the level of "give me all possible actions, and i will test all possible combinations."
                      If you believe that, you really ought to never be allowed near software deployment. CI is nice and all for finding out you've missed a semicolon, but anyone with real experience knows that users in the field turn up the bugs. If there's a data-destroying edge case, it's not going to be caught by running the same set of identical tests over and over again.

                      Comment

                      Working...
                      X