Announcement

**coder** · 07 August 2021, 05:23 AM

Originally posted by S.Pam View Post

What do you mean with end-to-end data protection?

If you search for the phrase, it's a feature you'll find some storage vendors advertising. I had forgotten the technical name, but it's "T10 Protection Information (T10 PI), previously known as Data Integrity Field (DIF)". It's basically a host-supplied checksum that gets persisted with the data, so the host can verify data integrity on a per-block basis.

The T10 PI model defines the contents of an additional 8bytes of information, increasing the sector size to 520 bytes. The additional bytes are used to store tags that can be used to verify the 512 bytes of data in the sector.

https://docs.broadcom.com/doc/12356057

That's only one reference. You can find more.

Originally posted by S.Pam View Post

I have myself had Samsung SSD give corrupt data on single sectors.

I never trusted Samsung consumer SSDs, because they don't provide any detailed technical information about them. Their main selling point is performance, and I worry they could be making bad tradeoffs to keep that edge.

I've always gone for Intel and Micron. Intel has really fallen off, but earlier generations of their SSDs had the most detailed specs I've ever seen for a storage product.

Originally posted by S.Pam View Post

Just the other day someone had silent corruption on some SanDisk SSD. https://lore.kernel.org/linux-btrfs/[email protected]/ this type of problem was confirmed by other people with similar experiences too.

SanDisk is mostly a bottom-feeder. Also, that post says nothing about using ECC memory, so it could actually be the host's fault.

Originally posted by S.Pam View Post

Facebook switched to btrfs because (in part) they realised that their hw raid systems created silent corruption.

I don't use HW RAID, because it's just another point of failure. Where I have, we used Dell-branded controllers that they independently validated and support. Never had any trouble with those.

Originally posted by S.Pam View Post

But of course you are free to use whatever you want, everyone is. Just don't claim corruptions don't happen because of some theoretical thing.

If you use buggy hardware (and no end-to-end data protection), then it goes without saying that you can have data corruption! You can also get data corruption if you use bleeding edge kernels, because software sometimes has bugs, too!

The standard I hold is that a user who cares enough about their data to use enterprise-grade drives, ECC memory, and mature software should have some reasonable degree of certainty that they're not getting silent data corruption. If that's not true (i.e. if there are links in the chain with no ECC or parity), then that would be worth knowing about. Maybe I'll actually enable T10 PI, since most of my hardware supports it (requires a reformat, though).

**coder** · 07 August 2021, 05:57 AM

Originally posted by waxhead View Post

Let me see if I can change your mind a bit - see this : https://www.spinics.net/lists/raid/msg32816.html where a guy ask about validating parity on read, Neil Brown himself answers if that helps you.

What belief, on my part, is this supposed to change?

Originally posted by waxhead View Post

I may be wrong about this , but I am pretty sure that some hardware raid1 solutions include a integrity block which helps the raid controller to figure out which disk was corrupt.

If they're talking about T10 PI, then sure.

Originally posted by waxhead View Post

Regarding your second point - I disagree. Others do as well - that is why you find papers on silent data corruption. Here you go!

https://www.usenix.org/legacy/events...tml/index.html

This supports my point, as they're actually relying on T10 PI to detect checksum errors! That's exactly what I meant by "end-to-end data protection", although it's my bad for failing to reference the precise mechanism to which I was referring.

The authors even endorse it with the statement:

"We find that the protection offered by the data integrity segment is well-worth the extra space needed to store them."

Also, they don't appear to have looked at SMART statistics and whether the drives with checksum errors actually reported any uncorrectable reads. If I see uncorrectable reads or even a high number of correctable ones (though the two tend to correlate), I replace that drive!

While it seems like a decent paper, it does appear to be probably 12 or 13 years old. And I mention this because all levels of the technology stack have evolved, since then. But, I'm a little more concerned about its relevance due to the lack of any information about correlation with failed reads or SMART statistics that could be important predictive indicators.

Anyway, maybe I can de-escalate what seems to be an increasingly pitched battle by agreeing that some filesystem-level checksum makes sense, at least in a scenario where you don't have drives supporting T10 PI and are relying on commodity storage and computing hardware of unknown quality and integrity. Thanks for taking the time & effort to hash this out.

**waxhead** · 07 August 2021, 07:34 AM

Originally posted by coder View Post

Anyway, maybe I can de-escalate what seems to be an increasingly pitched battle by agreeing that some filesystem-level checksum makes sense, at least in a scenario where you don't have drives supporting T10 PI and are relying on commodity storage and computing hardware of unknown quality and integrity. Thanks for taking the time & effort to hash this out.

No need to deescalate anything the way I see this at least. This has not reached any ugly flamewar status (yet). It is simply a debate about people being enthusiastic about certain topics

... and if I may - the main thing to take away from this talk is - i you care about your data, do something to ensure the integrity of it. Enterprise class hardware, filesystems, sha1 scripts to check your files etc... all nice, but tested, working backups are regardless a must for stuff you really care about anyway!

**coder** · 07 August 2021, 08:32 AM

Originally posted by waxhead View Post

tested, working backups are regardless a must for stuff you really care about anyway!

Backups aren't a complete answer to silent data corruption, however. If you have a data corruption problem, it could stretch back through many generations of backups, before someone finally notices. And most backups aren't kept indefinitely.

BTRFS' checksum feature is nice, even when it's not essential. At the least, it should give one more peace of mind. With scrubbing, it can help you find silent errors promptly (i.e. hopefully, before they do much damage or the most recent backup without the error is lost). It's a different approach than RAID-6, which is aimed more at fixing the live filesystem (with scrubbing), without having to resort to backups.

The fact that I'm running BTRFS on separate RAID-6 layer gives me both benefits, without having to rely on some of BTRFS' darker corners. When I started doing this, there were strongly-worded warnings against relying on BTRFS' built-in RAID capability. Running it atop mdraid has worked so well for me that I don't ever plan on revisiting that decision. At my job, we're using it atop enterprise hardware RAID controllers.

**waxhead** · 07 August 2021, 12:00 PM

Originally posted by coder View Post

Backups aren't a complete answer to silent data corruption, however. If you have a data corruption problem, it could stretch back through many generations of backups, before someone finally notices. And most backups aren't kept indefinitely.

BTRFS' checksum feature is nice, even when it's not essential. At the least, it should give one more peace of mind. With scrubbing, it can help you find silent errors promptly (i.e. hopefully, before they do much damage or the most recent backup without the error is lost). It's a different approach than RAID-6, which is aimed more at fixing the live filesystem (with scrubbing), without having to resort to backups.

The fact that I'm running BTRFS on separate RAID-6 layer gives me both benefits, without having to rely on some of BTRFS' darker corners. When I started doing this, there were strongly-worded warnings against relying on BTRFS' built-in RAID capability. Running it atop mdraid has worked so well for me that I don't ever plan on revisiting that decision. At my job, we're using it atop enterprise hardware RAID controllers.

Backups are indeed not the complete answer. That is why I said tested , working backups. e.g. verify your backups with a checksum list for example:

find . -type f -exec sha1sum {} \; > checksums.txt
and to veirfy...
sha1sum -c checksums.chk grep ":FAIlED" > checksums_failed.txt

I was also considering running BTRFS on top of RAID6. With DUP metadata it would be superior to mdraid6+ext4 , I however did go the other direction and run a plain BTRFS "RAID"1 with medatata in RAID1c4. Yes, I can tolerate one disk failure for data, but more for metadata. Willing to accept that drawback as long as I am reasonably safe that I can find what files are missing. BTRFS with data in RAID5/6 and metadata in RAID1c3/c4 is usable as long as you scrub on any unclean shutdown, I personally would prefer to wait until it is better tested. Maybe one day the write hole will be closed as well.

Announcement

Btrfs With Linux 5.14 Has More Performance Tuning, Other Improvements

Comment

Comment

Comment

Comment

Comment