Originally posted by lyamc
View Post
I've had all my home storage in zfs since 2008 and after reading about actual disk corruption rates, I thought that checksumming everything was overkill.
However, on all these years, I've had 3 cases of silent corruption on different computers which was detected by zfs and notified to me by email, and scrubs would always find wrong data.
As I was away for few months, I was unable to troubleshoot the problem and zfs mirroring and checksumming allowed everything to work. When I finally got to troubleshoot the issue, it was fixed by replacing the SATA cables. No errors in kernel logs.
At work, we used to have a backup on our NTFS san for "corruption prevention". I've always asked the tech team how would we detect corruption and the answer was "if some client reports wrong data". At the end of the project, as I was archiving, I saw two xml files of 12 and 33mb each, when should be few kb. The files were corrupt and unreadable, and yet nothing in the SAN or NTFS or Windows had detected anything.
For me the situation is clear, even if disks work fine, there's always the possibility to have failure somewhere else, which is more common and detected by checksumming. So *my* data will always be in a checksummed file system.
I really look forward to bcachefs being production ready and tested in few years time.
Leave a comment: