Originally posted by intelfx
View Post
Announcement
Collapse
No announcement yet.
Bcachefs Linux File-System Seeing Performance Improvements, Other Progress
Collapse
X
-
Originally posted by aht0 View PostJesus. I'll skip the side-tracking shit since you don't even seem to get what I was trying to tell.
Basically:
FUA was implemented in SCSI (T10) specification but not in the original ATA (T13) specification. It was added past 2001-2002 (feel free to skim through www.t13.org's pdf's). As such it's something that has not been guaranteed to be implemented/work on all SATA drives.
Designing a file system driver to use a not-guaranteed feature of ATA specification is the problem I am trying to point out to you. Windows does not do this and thus avoids all this breakage-trough-firmware-problem. Linux does it and it's users have to suffer through it, while devs blame it on "faulty hardware". But end-users are still having to face the breakage. Had the devs chosen to not use FUA, breakage would be less. Simple as that.
Reliable and slightly slower IO is still working IO, compared to faster but breakage-pron(er) IO.
Linux kernel performs runtime detection of whether the drive in question claims to support FUA, and falls back to FLUSH CACHE instead.
I'm using the "FUA" nomenclature for brevity, assuming "either FUA or FLUSH CACHE depending on the drive".Last edited by intelfx; 08 July 2020, 07:25 AM.
Leave a comment:
-
Originally posted by lyamc View PostIntelfx you're reading past a lot of what aht0 is saying and projecting what you think he's saying.
<...>
What you read is "Don't trust the hardware but trust the hardware, I'm an idiot."
What he said was: "Where you have to trust hardware, add hardware checking (ECC) or filesystem sanity checks."
Leave a comment:
-
Intelfx you're reading past a lot of what aht0 is saying and projecting what you think he's saying. Example:
aht0:
Design the whole thing accounting for worst possible cases? When you design something super-complicated with the naive assumption that nothing it depends on, would never somehow break - you are literally asking Murphy to kick your ass and collect your scalp.
Yup, and then corrupt data will be written. Which is, in fact, more likely to happen with, for example ZFS, than other file systems because they way it does use RAM more extensively. In a way it's also a kind of design-issue, counter-balanced by additional advantages. But it can be compensated for and users do it by using ECC RAM. It's rare scenario but bit-flips do happen on occassion. Similar case with hard drives: manufacturer's fuck-ups happen, one should try to minimize the use of questionable interfaces.
What he said was: "Where you have to trust hardware, add hardware checking (ECC) or filesystem sanity checks."
Leave a comment:
-
Originally posted by useless View Post
That's why I wrote "is know to HAVE very bad firmware revisions". Maybe it was fixed, maybe not. But, as Zygo said, the surge in corruption cases CAN be linked to a particular combination of firmware/models of 2TB WD drives. Given that the sample is quite large (according to him), that there were reports from other users, and that it can be easily mitigated with just disabling write caching, I'm inclining to think that those drives are defective. He said his survey can't be considered scientifically solid (for obvious reasons) but it does present a sound case for further study. Bottom line: no, not questionable at best.
Read the f**king link I posted. 1TB Green drives are NOT known to have bad firmware revisions (well, unless you provide another credible source).
Model Family: Western Digital Green Device Model: WDC WD10EZRX-00L4HB0 Firmware Version: 01.01A01
Leave a comment:
-
Originally posted by aht0 View PostOP's hard drive (Western Digital WD20EZRX) being one of "very bad firmware revisions) you linked against is at best questionable.
Originally posted by aht0 View PostI own one of these "listed problematic" 1Tb WD Greens. (WDC WD10EZRX). Work well enough as a "holding-stuff"-drive. In Windows. Maybe I should throw Tumbleweed on it and see what will happen.
Leave a comment:
-
Originally posted by intelfx View Post...
Basically:
FUA was implemented in SCSI (T10) specification but not in the original ATA (T13) specification. It was added past 2001-2002 (feel free to skim through www.t13.org's pdf's). As such it's something that has not been guaranteed to be implemented/work on all SATA drives.
Designing a file system driver to use a not-guaranteed feature of ATA specification is the problem I am trying to point out to you. Windows does not do this and thus avoids all this breakage-trough-firmware-problem. Linux does it and it's users have to suffer through it, while devs blame it on "faulty hardware". But end-users are still having to face the breakage. Had the devs chosen to not use FUA, breakage would be less. Simple as that.
Originally posted by intelfx View PostNo idea. But if what you say is true, well, that's why windows' IO sucks big time.Last edited by aht0; 02 July 2020, 03:01 PM.
- Likes 1
Leave a comment:
-
Originally posted by aht0 View PostYup, and then corrupt data will be written. Which is, in fact, more likely to happen with, for example ZFS, than other file systems because they way it does use RAM more extensively. In a way it's also a kind of design-issue <...>
Originally posted by aht0 View PostBut it can be compensated for and users do it by using ECC RAM. It's rare scenario but bit-flips do happen on occassion. Similar case with hard drives: manufacturer's fuck-ups happen, one should try to minimize the use of questionable interfaces.
Originally posted by aht0 View PostWhen you design "something super-complicated" then first thought should on possible points-of-failure. "What can go wrong, will go wrong". Why would you even try to rely on FUA if it's not really known whether SATA drives would actually adhere to it? Do they?
Why would you even try to rely on CPU if it's not really known it does not malfunction?
Again, stupid question and stupid way of thinking. You would rely on FUA because it's written in the standard. Everything else is bad hardware, period.
Originally posted by aht0 View PostHow does windows handle the case? It does not use FUA. It will instead send the commands to flush the disk write cache after writes. Unless it's dealing with SCSI or Fibre Channel drives.
Originally posted by aht0 View PostI thought Linux, when it comes to SATA, does not use FUA either - am I wrong?
Leave a comment:
-
Originally posted by intelfx View PostIf RAM bitflips and the filesystem silently writes out corrupt data, will you claim that the filesystem driver should have somehow protected itself against faulty RAM?
The world doesn't work this way.
When you design something super-complicated, you break it into layers with well-defined interfaces. When a layer doesn't follow its own interface (for example, when a disk drive completes a FUA command and then loses the data) you cannot guarantee anything.
When you design "something super-complicated" then first thought should on possible points-of-failure. "What can go wrong, will go wrong". Why would you even try to rely on FUA if it's not really known whether SATA drives would actually adhere to it? Do they?
How does windows handle the case? It does not use FUA. It will instead send the commands to flush the disk write cache after writes. Unless it's dealing with SCSI or Fibre Channel drives.
I thought Linux, when it comes to SATA, does not use FUA either - am I wrong? Or it was btrfs dev's solo performance?
- Likes 1
Leave a comment:
-
Originally posted by useless View PostFunny thing about that thread: the OP's hard drive is know to have very bad firmware revisions. Check https://lore.kernel.org/linux-btrfs/[email protected]. Sadly, btrfs depends on correct fsync behavior. Luckily, you can easily deal with them disabling the write cache.
WDC WD20EZRX-00DC0B0 Firmware Version: 80.00A80
Originally posted by starshipeleven View PostThat's not an answer
You, the great programmer expert in filesystems, how do you deal with a drive that just lies when you call a fsync and returns "action completed" even if in fact it did not?
When machine happens to have "known problematic drive", btrfs driver should squirt regular warnings into system log and turn off write caching whether user likes it or not. That's what could be added to the code at minimum.
Originally posted by starshipeleven View PostBut if it is something basic and crucial like fsync is broken there is really no workaround, that drive is trash.
I own one of these "listed problematic" 1Tb WD Greens. (WDC WD10EZRX). Work well enough as a "holding-stuff"-drive. In Windows. Maybe I should throw Tumbleweed on it and see what will happen.
- Likes 1
Leave a comment:
Leave a comment: