Announcement

Collapse
No announcement yet.

OpenZFS Is Still Battling A Data Corruption Issue

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jrssnet
    replied
    Originally posted by phoronix_anon View Post
    Eagerly awaiting a Jim Salter article about how this is actually a btrfs issue.
    Actually I was just today saying in the OpenZFS Production Users conference call "I am glad that Phoronix is bringing attention to this, yes these bugs are relatively difficult to hit but they are data corruption bugs, and the last thing I want to do is try to sweep them under the rug."

    Admittedly, I may have mentioned a certain filesystem as an example of "sweeping the problems under the rug" when I said that...

    Leave a comment:


  • grahamperrin
    replied
    Originally posted by sam_vde View Post
    … a comment from the man who wrote the patch to fix it. …
    Thanks. There's more to come from Rob Norris. Stay tuned.

    Leave a comment:


  • grahamperrin
    replied
    Originally posted by Almindor View Post

    After losing my data? No thanks :P
    Without knowing your own use case: we can't tell whether you were at risk.

    There's much to suggest that if you were at risk of openzfs/zfs issue 15526, you would have been at far greater risk of data corruption from other causes.

    You can thank those other causes, not ZFS, for the detectable corruption in your case. You have detected corruption, haven't you?

    Leave a comment:


  • stiiixy
    replied
    Way to completely misrepresent, hey.

    Very black and white, or binary, considering your name.

    Leave a comment:


  • Developer12
    replied
    Originally posted by stiiixy View Post

    I don't recall Michael being bound to any rule, code or law that impedes or prevents him from posting similar news.

    He's not a journalist who's bound to a nerdy version of the Hippocratic oath. He's a guy super hyped on Linux, and did what a lot of people would never consider and went out on his own to talk about it. Fast forward twenty years, his primary source of income has dwindled immemsely, despite his payers making billions, and all you have is;

    'reposting a filesystem issue twice blahblihblooblah'

    No offence (you're going to be anyway), but he's got to eat. If it means I have to revise an existing article (and I personally like filesystem stuff anyway so I am biased), I daresay my eyes are not going to bleed.
    In other words, you're saying michael is not bound by any integrity and is free to be a shill. That's not the strong argument you think it is.

    Leave a comment:


  • sam_vde
    replied
    This is a comment from the man who wrote the patch to fix it. Bottom line: don't panic. Most people don't understand how unlikely they are to be affected.

    Leave a comment:


  • Almindor
    replied
    Originally posted by grahamperrin View Post

    except the experts who understand, and have fixed things.

    You're welcome.
    After losing my data? No thanks :P

    Leave a comment:


  • billyswong
    replied
    Originally posted by whatever78 View Post

    Detect silent data corruption under Linux using sha256 stored in extended attributes - rfjakob/cshatag

    Interesting tool. I feel uneasy those filesystem that claim capacity of detecting silent corruption. When bit flip corruption occurs, throwing the whole file away is not always the best solution. But all the praises of those new file systems sounds like it is the only thing that one shall do or must do.

    Leave a comment:


  • whatever78
    replied
    Originally posted by muncrief View Post

    I have around 12.5 TB of data, with the majority 11 TB residing on my media server. And though I have local and cloud backups going back years, the problem with ext4 was that I couldn't detect bit rot. And by the time I discovered a bad file I would have no idea when it became corrupted. My hope was that with a monthly ZFS scrub I could detect bit rot and restore good files from my backups, rather than buying two or three times the amount of storage I need and creating complex RAID systems that had also failed me so many times over the decades.
    I've got 300TB on ext4. It's really 100TB but 3 copies total (1st main file server, 2nd local backup, 3rd remote file server)

    I will start by saying that not counting hardware failures I have about 1 silent bit rot event every 2 years. This is where the data was good on disk and somehow has been corrupted. Cosmic ray? Who knows. With only 12.5TB of data you may experience this once every 50 years.

    I tried testing btrfs multiple times and I've had issues with it that are a longer discussion. I don't like ZFS being outside of the kernel and its drive expansion method isn't flexible enough for me. So I've stuck with ext4.

    I really like the concept of the data integrity verification method being SEPARATE from the filesystem code itself. With this recent ZFS zeroing out files what happens to the checksums when this happens? Does the ZFS scrub show everything is fine even though the data has been corrupted?

    I started out running md5sum recursively (md5deep -r does this more easily) and then diffing the output with the previous results from 6 months earlier. About 10 years ago I switched to this program that stores SHA256 checksums and timestamps as ext4 extended attribute metadata. Run it again and it recaculates, compares, and reports.

    Detect silent data corruption under Linux using sha256 stored in extended attributes - rfjakob/cshatag


    I also run snapraid to dual parity drives once a night. It also has a built in scrub feature. Then I create my backups with rsync -X which transfers the extended attributes. So basically every 6 months I run snapraid scrub, verify the local copy is correct, run cshatag locally, then rsync -X to local backup and remote backup, then cshatag on the local and remote backups. This takes about 3 days total but it's just run time in the background and slightly louder fans.

    Again, I consider it a FEATURE to have this NOT unified into one comprehensive filesystem. I like a second independent tool verifying the correctness of the first tool.


    Leave a comment:


  • stiiixy
    replied
    Originally posted by Developer12 View Post

    Should michael write an article about every issue opened on their github about this bug? there have been over five of them and counting. How about for every comment? Every time a ZFS developer weighs in?

    He's just double-dipping on the same ZFS bug twice in the span of a week because ZFS/BTRFS flamewars generate tons of engagement, and thus ad revenue.
    I don't recall Michael being bound to any rule, code or law that impedes or prevents him from posting similar news.

    He's not a journalist who's bound to a nerdy version of the Hippocratic oath. He's a guy super hyped on Linux, and did what a lot of people would never consider and went out on his own to talk about it. Fast forward twenty years, his primary source of income has dwindled immemsely, despite his payers making billions, and all you have is;

    'reposting a filesystem issue twice blahblihblooblah'

    No offence (you're going to be anyway), but he's got to eat. If it means I have to revise an existing article (and I personally like filesystem stuff anyway so I am biased), I daresay my eyes are not going to bleed.

    Leave a comment:

Working...
X