Announcement

**timofonic** · 21 December 2021, 11:46 AM

Originally posted by caligula View Post

FB didn't do this. Zstd was invented before FB came along.

Is it sarcasm?

**itoffshore** · 21 December 2021, 01:18 PM

Originally posted by timofonic View Post

I never use Facebook nor Instagram or any of their products and consider them very evil (and fatal for teenagers), but Zstd is their only good product.

Btrfs is a total failure.

Congratulations, Facebook = META. You did something good, finally

BTRFS allows me to run a fully encrypted system without a separate /boot being needed.

I can also boot btrfs root snapshots from GRUB which is very useful - no more booting from USB to fix issues.

The only issue I've had in 6-7 years on btrfs was due to dropbox messing up a /home subvol & filling up the disk (the only time I ever did not have a separate /home partition)

**sdack** · 21 December 2021, 02:54 PM

Originally posted by useless View Post

If anything, having automatic and trouble-free checksumming is an understatement.

Is it though?

Firstly, if you need a filesystem to do checksums then only because the drives and the bus fail at doing so. This is not impossible, but a drive not reporting a corruption, but the filesystem detecting one is extremely rare. It is more likely for the filesystem to have a bug, or to have lost its checksum due to a RAM corruption, and as a result reports a false positive. Do keep in mind a checksum is not evidence of corrupted data, but only of a mismatch between checksum and data, and evidence of only a possibility of a corruption. An event where a drive's error detection has failed and did not discover a corruption, but the filesystem does, is the extreme case. It does not matter if this is BTRFS, EXT4, F2FS or any other filesystem. It is more likely for the data on the drive to still be correct.

Secondly, putting a checksum on all your data means you are giving everything the same priority, from the most useless bits in some long-forgotten file to the content of /etc/shadow, with the result that a single, meaningless bit flip makes you halt your system, reinstall the file, and question your entire hardware. This is not what every user wants. Many are happy with letting bits-flip until it actually causes a problem, and not only a checksum error. Many users do not wish to give in to a paranoia of a data corruption always leading to the worst-case scenario. It just is not always the most important bits on your system that get corrupted. Nor do these people suffer false positives. Only a few people need this much checksumming and they know what they are doing.

Thirdly, of course, I do run backups, but again do I have no use for checksums here, but only for the actual backup data. No checksum is going to replace the data and even if I did use checksums here would I still look at the data itself to see if it actually is corrupt. Again, checksums do not fix data losses, only redundancy does, and whether data is still usable is not determined by a checksum but by the software that uses the data, which I will have to test anyway, with or without there being a checksum.

For the rare cases where I do want a checksum for a file, i.e. for security reasons, do I use SHA256 and better, but this I admit is also more based on paranoia and perhaps the need to be an over-achiever than on actually having a good reason for it. To me does a checksum literally only add a check into my workflow, thereby only adding to my work, but not removing any. Thus do I prefer redundancy and only enable checksums where these are useful to me. To say having an automatic and trouble-free checksumming just makes little sense to me. It better be automatic, because I sure do not want to calculate the checksum myself, and trouble-free just does not exist. Once you have a mismatch do you also have trouble.

**useless** · 21 December 2021, 04:09 PM

Originally posted by sdack View Post

Is it though?

Firstly, if you need a filesystem to do checksums then only because the drives and the bus fail at doing so. This is not impossible, but a drive not reporting a corruption, but the filesystem detecting one is extremely rare.

No, it's not: cheap controllers, bad firmwares, and tons of other examples if you do a quick search in the btrfs mailing list archive. There's potentially more data points in the IRC channel but I don't follow it. Even if it's rare, don't you wanna know?

Originally posted by sdack View Post

It is more likely for the filesystem to have a bug, or to have lost its checksum due to a RAM corruption, and as a result reports a false positive.

Which is funny since btrfs already does some checking at write time for that precise scenario.

Originally posted by sdack View Post

Do keep in mind a checksum is not evidence of corrupted data, but only of a mismatch between checksum and data, and evidence of only a possibility of a corruption.

Well, yes? For all practical purposes, does it matter?

Originally posted by sdack View Post

An event where a drive's error detection has failed and did not discover a corruption, but the filesystem does, is the extreme case. It does not matter if this is BTRFS, EXT4, F2FS or any other filesystem. It is more likely for the data on the drive to still be correct.

I dare to ask you for a source on this claim, it is a bold one.

Originally posted by sdack View Post

Secondly, putting a checksum on all your data means you are giving everything the same priority, from the most useless bits in some long-forgotten file to the content of /etc/shadow, with the result that a single, meaningless bit flip makes you halt your system, reinstall the file, and question your entire hardware.

Really? SATA has CRC, hard drives have some form of ECC, RAM has ECC (well, it should) and, somehow, you ended up with undetected corruption. How do you trust that hardware from now on? At least that warrants some troubleshooting.

Originally posted by sdack View Post

This is not what every user wants. Many are happy with letting bits-flip until it actually causes a problem, and not only a checksum error. Many users do not wish to give in to a paranoia of a data corruption always leading to the worst-case scenario. It just is not always the most important bits on your system that get corrupted. Nor do these people suffer false positives. Only a few people need this much checksumming and they know what they are doing.

Which users? The same naive users that can't tell what checksum is? What could possibly be more user friendly than a filesystem that errors out at the first sign of corruption? Anyway, it's kind of funny you brought paranoia since there are essentially two types of users: those who have lost data and those who still didn't. In the context of bad hardware, btrfs (and ZFS) gives the first an easy way to check for corruption and educates the second about the importance of resilience.

Originally posted by sdack View Post

Thirdly, of course, I do run backups, but again do I have no use for checksums here, but only for the actual backup data. No checksum is going to replace the data and even if I did use checksums here would I still look at the data itself to see if it actually is corrupt.

I never stated that checksum gives back your correct data.

Originally posted by sdack View Post

Again, checksums do not fix data losses, only redundancy does, and whether data is still usable is not determined by a checksum but by the software that uses the data, which I will have to test anyway, with or without there being a checksum.

Of course. Anyone that cares for their data regularly check the backups. But a checksum missmatch already tells you one thing: your backups strategy has holes. What if you have bad hardware that's corrupting your backups and it goes on for a while because the data is still valid per your app tests? Can you be sure that it doesn't kick you in the future? Does your backup strategy works ok for all kind of data?

Originally posted by sdack View Post

For the rare cases where I do want a checksum for a file, i.e. for security reasons, do I use SHA256 and better, but this I admit is also more based on paranoia and perhaps the need to be an over-achiever than on actually having a good reason for it. To me does a checksum literally only add a check into my workflow, thereby only adding to my work, but not removing any.

Well, while you're at it: why do error checking at the link layer? why use ECC? why do hard drives have erasure coding? why do all of that If you can test directly at your app?

Originally posted by sdack View Post

Thus do I prefer redundancy and only enable checksums where these are useful to me.

A ha! You souldn't generalize the usefulness of some feature by looking at your specific workflow.

**geearf** · 21 December 2021, 04:22 PM

Originally posted by DrYak View Post

I would like to point out that it is also possible to periodically run both short (I do nightly) and long (used to be weekly on all drives, but now I only run them monthly on mechanical HDD, given that I have a weekly btrfs scrub) using the smartd service that comes with smarttools.

Hey that's a good idea!
Thanks!

**sdack** · 21 December 2021, 05:39 PM

Originally posted by useless View Post

A ha! You souldn't generalize the usefulness of some feature by looking at your specific workflow.

I did not say it was useless. I said one should not overrate it. I still think you do overrate its usefulness because you are not really focused on arguing on why you think you would need "automatic and trouble-free checksumming", but instead are you now only interested in what I wrote. This tells me you want to be right, but not that you are. Frankly, I think you like checksumming simply because it exists.

**Eumaios** · 21 December 2021, 07:11 PM

Originally posted by DrYak View Post

Although I too run "btrfs scrub" periodically to detect such problems (and correct them if a usable RAID1 copy exists), I would like to point out that it is also possible to periodically run both short (I do nightly) and long (used to be weekly on all drives, but now I only run them monthly on mechanical HDD, given that I have a weekly btrfs scrub) using the smartd service that comes with smarttools.

Thank you sincerely for your suggestion about smartd. I searched for smarttools but couldn't find it, though I found many links to smartmontools. Is that what you use, or is there another package called smarttools?

**Eumaios** · 21 December 2021, 07:16 PM

Originally posted by sdack View Post

Thirdly, of course, I do run backups, but again do I have no use for checksums here, but only for the actual backup data. No checksum is going to replace the data and even if I did use checksums here would I still look at the data itself to see if it actually is corrupt. Again, checksums do not fix data losses, only redundancy does, and whether data is still usable is not determined by a checksum but by the software that uses the data, which I will have to test anyway, with or without there being a checksum.

I appreciate that you take the time to explain your meaning in your posts; I find it helpful. As a sincere question from someone who is still relatively ignorant (and not a criticism), how do you check the data, to make sure that the backups that you make are not corrupted? That is, I can easily imagine backing up data for redundancy and finding out later that bit rot was creeping up even on my backups, so that I have no good data any more from which to restore.

**Eumaios** · 21 December 2021, 07:22 PM

useless and sdack , I wonder if you two are arguing past each other somewhat. You seem to disagree about whether a checksum error is evidence of bad hardware and bit rot, and then everything after that you seem to be talking past each other rather than to each other. I really don't mean to criticize, because I appreciate and am learning from both of your responses.

**useless** · 21 December 2021, 07:23 PM

Originally posted by sdack View Post

I did not say it was useless. I said one should not overrate it. I still think you do overrate its usefulness because you are not really focused on arguing on why you think you would need "automatic and trouble-free checksumming", but instead are you now only interested in what I wrote. This tells me you want to be right, but not that you are. Frankly, I think you like checksumming simply because it exists.

Economics. I thought that was pretty clear from my first reply.

One simple example would be personal videos and photos: snapshot, backup, test once, scrub frequently. Any change will be evident without going photo after photo, video after video (you can't automate this process a lot more). New data? Repeat. You can test at longer intervals.

Cold storage is another example. You can sample and test them a lot less frequently than you can scrub.

Another one would be some system daemon misbehaving because corruption. With btrfs you will get EIO and prevent all kind of quirks (which can be important if that daemon touches important data).

Even more is the fact that drives have bugs. If your tests are throwing errors then you will search for the cause. What is it? Did your hardware crap itself or your app corrupted the data somehow? If you add filesystem checksumming then you know that you need to replace your drive or controller and you can rule out bugs in your app (your app has bugs too). What if there has been a change in you app and it can no longer open your backups for whatever reason? Do you discard your 'corrupted' backups?

In my last reply I gave you one scenario in which having checksum not only is helpful but necessary: your app can fail too. But you decided to ignore it. I don't have the need to be right, you issued some statements in that reply that was worth to argue about, nothing more.

All in all, trouble-free checksumming allows regular users to notice corruption and take action. If you have automatic scrubs (which you should) you can prevent more damage soon enough and lose less recently generated data.

Announcement

Zstd 1.5.1 Released With Even More Performance Improvements

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment