Announcement

Collapse
No announcement yet.

Bcachefs Linux File-System Seeing Performance Improvements, Other Progress

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Originally posted by intelfx View Post
    If RAM bitflips and the filesystem silently writes out corrupt data, will you claim that the filesystem driver should have somehow protected itself against faulty RAM?
    The world doesn't work this way.

    When you design something super-complicated, you break it into layers with well-defined interfaces. When a layer doesn't follow its own interface (for example, when a disk drive completes a FUA command and then loses the data) you cannot guarantee anything.
    Yup, and then corrupt data will be written. Which is, in fact, more likely to happen with, for example ZFS, than other file systems because they way it does use RAM more extensively. In a way it's also a kind of design-issue, counter-balanced by additional advantages. But it can be compensated for and users do it by using ECC RAM. It's rare scenario but bit-flips do happen on occassion. Similar case with hard drives: manufacturer's fuck-ups happen, one should try to minimize the use of questionable interfaces.

    When you design "something super-complicated" then first thought should on possible points-of-failure. "What can go wrong, will go wrong". Why would you even try to rely on FUA if it's not really known whether SATA drives would actually adhere to it? Do they?

    How does windows handle the case? It does not use FUA. It will instead send the commands to flush the disk write cache after writes. Unless it's dealing with SCSI or Fibre Channel drives.
    I thought Linux, when it comes to SATA, does not use FUA either - am I wrong? Or it was btrfs dev's solo performance?

    Comment


    • #82
      Originally posted by aht0 View Post
      Yup, and then corrupt data will be written. Which is, in fact, more likely to happen with, for example ZFS, than other file systems because they way it does use RAM more extensively. In a way it's also a kind of design-issue <...>
      Yeah, I see, the only flawlessly designed program is the one that uses 0 bytes RAM and does not do anything. This is a stupid way of thinking.

      Originally posted by aht0 View Post
      But it can be compensated for and users do it by using ECC RAM. It's rare scenario but bit-flips do happen on occassion. Similar case with hard drives: manufacturer's fuck-ups happen, one should try to minimize the use of questionable interfaces.
      You contradict yourself. First you say that one should protect oneself from hardware fuck-ups by getting better hardware (ECC RAM), and then you say that one should protect oneself from hardware fuck-ups by minimizing the usage of said hardware.

      Originally posted by aht0 View Post
      When you design "something super-complicated" then first thought should on possible points-of-failure. "What can go wrong, will go wrong". Why would you even try to rely on FUA if it's not really known whether SATA drives would actually adhere to it? Do they?
      Why would you even try to rely on RAM if it's not really known it is perfectly reliable?
      Why would you even try to rely on CPU if it's not really known it does not malfunction?
      Again, stupid question and stupid way of thinking. You would rely on FUA because it's written in the standard. Everything else is bad hardware, period.

      Originally posted by aht0 View Post
      How does windows handle the case? It does not use FUA. It will instead send the commands to flush the disk write cache after writes. Unless it's dealing with SCSI or Fibre Channel drives.
      No idea. But if what you say is true, well, that's why windows' IO sucks big time.

      Originally posted by aht0 View Post
      I thought Linux, when it comes to SATA, does not use FUA either - am I wrong?
      You thought wrong.

      Comment


      • #83
        Originally posted by intelfx View Post
        ...
        Jesus. I'll skip the side-tracking shit since you don't even seem to get what I was trying to tell.

        Basically:
        FUA was implemented in SCSI (T10) specification but not in the original ATA (T13) specification. It was added past 2001-2002 (feel free to skim through www.t13.org's pdf's). As such it's something that has not been guaranteed to be implemented/work on all SATA drives.

        Designing a file system driver to use a not-guaranteed feature of ATA specification is the problem I am trying to point out to you. Windows does not do this and thus avoids all this breakage-trough-firmware-problem. Linux does it and it's users have to suffer through it, while devs blame it on "faulty hardware". But end-users are still having to face the breakage. Had the devs chosen to not use FUA, breakage would be less. Simple as that.

        Originally posted by intelfx View Post
        No idea. But if what you say is true, well, that's why windows' IO sucks big time.
        Reliable and slightly slower IO is still working IO, compared to faster but breakage-pron(er) IO.
        Last edited by aht0; 02 July 2020, 03:01 PM.

        Comment


        • #84
          Originally posted by aht0 View Post
          OP's hard drive (Western Digital WD20EZRX) being one of "very bad firmware revisions) you linked against is at best questionable.
          That's why I wrote "is know to HAVE very bad firmware revisions". Maybe it was fixed, maybe not. But, as Zygo said, the surge in corruption cases CAN be linked to a particular combination of firmware/models of 2TB WD drives. Given that the sample is quite large (according to him), that there were reports from other users, and that it can be easily mitigated with just disabling write caching, I'm inclining to think that those drives are defective. He said his survey can't be considered scientifically solid (for obvious reasons) but it does present a sound case for further study. Bottom line: no, not questionable at best.

          Originally posted by aht0 View Post
          I own one of these "listed problematic" 1Tb WD Greens. (WDC WD10EZRX). Work well enough as a "holding-stuff"-drive. In Windows. Maybe I should throw Tumbleweed on it and see what will happen.
          Read the f**king link I posted. 1TB Green drives are NOT known to have bad firmware revisions (well, unless you provide another credible source).

          Comment


          • #85
            Originally posted by useless View Post

            That's why I wrote "is know to HAVE very bad firmware revisions". Maybe it was fixed, maybe not. But, as Zygo said, the surge in corruption cases CAN be linked to a particular combination of firmware/models of 2TB WD drives. Given that the sample is quite large (according to him), that there were reports from other users, and that it can be easily mitigated with just disabling write caching, I'm inclining to think that those drives are defective. He said his survey can't be considered scientifically solid (for obvious reasons) but it does present a sound case for further study. Bottom line: no, not questionable at best.



            Read the f**king link I posted. 1TB Green drives are NOT known to have bad firmware revisions (well, unless you provide another credible source).
            Why don't you "read the fucking link" you originally posted. It does list among others
            Model Family: Western Digital Green Device Model: WDC WD10EZRX-00L4HB0 Firmware Version: 01.01A01
            No go check in what capacity drives as WD10EZRX designated come in.. it's 1TB WD Green.

            Comment


            • #86
              Intelfx you're reading past a lot of what aht0 is saying and projecting what you think he's saying. Example:

              aht0:

              Design the whole thing accounting for worst possible cases? When you design something super-complicated with the naive assumption that nothing it depends on, would never somehow break - you are literally asking Murphy to kick your ass and collect your scalp.
              and

              Yup, and then corrupt data will be written. Which is, in fact, more likely to happen with, for example ZFS, than other file systems because they way it does use RAM more extensively. In a way it's also a kind of design-issue, counter-balanced by additional advantages. But it can be compensated for and users do it by using ECC RAM. It's rare scenario but bit-flips do happen on occassion. Similar case with hard drives: manufacturer's fuck-ups happen, one should try to minimize the use of questionable interfaces.
              What you read is "Don't trust the hardware but trust the hardware, I'm an idiot."

              What he said was: "Where you have to trust hardware, add hardware checking (ECC) or filesystem sanity checks."

              Comment


              • #87
                Originally posted by lyamc View Post
                Intelfx you're reading past a lot of what aht0 is saying and projecting what you think he's saying.

                <...>

                What you read is "Don't trust the hardware but trust the hardware, I'm an idiot."

                What he said was: "Where you have to trust hardware, add hardware checking (ECC) or filesystem sanity checks."
                You cannot invent sanity checks for everything. You cannot protect against faulty RAM or CPU in software, you have to get better hardware (ECC RAM). Similarly, you cannot protect against disks that report "cache flushed, all is well" but do not actually do that. You have to get better hardware.

                Comment


                • #88
                  Originally posted by aht0 View Post
                  Jesus. I'll skip the side-tracking shit since you don't even seem to get what I was trying to tell.

                  Basically:
                  FUA was implemented in SCSI (T10) specification but not in the original ATA (T13) specification. It was added past 2001-2002 (feel free to skim through www.t13.org's pdf's). As such it's something that has not been guaranteed to be implemented/work on all SATA drives.

                  Designing a file system driver to use a not-guaranteed feature of ATA specification is the problem I am trying to point out to you. Windows does not do this and thus avoids all this breakage-trough-firmware-problem. Linux does it and it's users have to suffer through it, while devs blame it on "faulty hardware". But end-users are still having to face the breakage. Had the devs chosen to not use FUA, breakage would be less. Simple as that.


                  Reliable and slightly slower IO is still working IO, compared to faster but breakage-pron(er) IO.
                  Your point is nonexistent, and you yourself don't know what you are trying to tell.

                  Linux kernel performs runtime detection of whether the drive in question claims to support FUA, and falls back to FLUSH CACHE instead.

                  I'm using the "FUA" nomenclature for brevity, assuming "either FUA or FLUSH CACHE depending on the drive".
                  Last edited by intelfx; 08 July 2020, 07:25 AM.

                  Comment


                  • #89
                    Originally posted by intelfx View Post

                    You cannot invent sanity checks for everything. You cannot protect against faulty RAM or CPU in software, you have to get better hardware (ECC RAM). Similarly, you cannot protect against disks that report "cache flushed, all is well" but do not actually do that. You have to get better hardware.
                    I don't know why you're repeating what was already said.

                    Comment

                    Working...
                    X