Originally posted by carewolf
View Post
A vibration can cause the head to go over the wrong track at the start of a write and perfectly overwrite it. In that case, the hardware ECC will look fine, but the data will be wrong. This has been observed in production in the past and is one of the reasons that ZFS’ end to end design is necessary to ensure data integrity.
There are other possible causes of corruption too that are similarly not handled by the drive’s ECC. One would be a disk controller sending corrupted (but valid) write commands to the disks. Corruption from controllers has also been observed in the past and ZFS is able to detect it. The drive’s ECC is oblivious to it.
If it counts for anything, I have more than a hundred commits in the ZFSOnLinux source tree and a few commits in Linus’ tree. I am speaking in the context of that experience.
Comment