Announcement

Collapse
No announcement yet.

The Linux 4.0 EXT4 RAID Corruption Bug Has Been Uncovered

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    The patch fix has been pushed to Arch's 4.0.4 and 3.14 LTS kernels earlier today.

    Comment


    • #22
      Originally posted by duby229 View Post
      I can't say I fully understand it. It has something to do with limitations WD imposes at the firmware level. One problem green drives have is they can trigger RAID controllers to mark the drive bad because of missing error correction feature TLER. Another problem green drives have is they don't have the logic to keep seeking syncronized between drives and it adds a bunch wear and tear. These are issues that only occur in RAID setups.
      Huh, I haven't heard about that (not that I ever used my WD Green on RAID, though).

      Comment


      • #23
        Originally posted by GreatEmerald View Post

        Huh, I haven't heard about that (not that I ever used my WD Green on RAID, though).
        physically green and red drives are identical, only difference is firmware.

        Comment


        • #24
          Originally posted by duby229 View Post
          I don't understand how BTRFS does RAID, but I don't think it works in the traditional way. So the hardware limitations of the green drives firmware microcode may not apply for all I know.
          Indeed, it's a bit different.
          Classic RAID (Linux's MD and DMRAID/MAPPER/LVM, physical hardware raid, BIOS raid) use a separate intermediate step that works with block devices: it uses several block devices, it works internally at the level of blocks, and it exposes a single block device to the file system. EXT4 will see only 1 (virtual) drive/partition.

          BTRFS and ZFS have their own internal handling : they see several block devices, and work at the file system level (extents are replicated across block devices in a pool). Because they are filesystems and they see the drives directly, they might be less likely to be affected by the peculiar weird thing that affects WD drives.

          Originally posted by duby229 View Post
          It has something to do with limitations WD imposes at the firmware level. One problem green drives have is they can trigger RAID controllers to mark the drive bad because of missing error correction feature TLER.
          Originally posted by duby229 View Post
          physically green and red drives are identical, only difference is firmware.
          As far as I've understood, TLER (time-limited error recovery) aka ERC (error recovery control) controls the time that a drive will spend trying to recover read/write error.
          - WD non optimized for RAID are set to try for as long as it might takes. (You want to keep hope that eventually your data will get recovered).
          - WD optimized for RAID have a TLER set to 7 seconds.
          In a RAID if a drive doesn't respond for some time, the controller (or the drivers if its software) might decide that the whole drive has failed and that the it should get dropped from the raid. In a RAID 0 that has no redundancy at all, this is fatal, the whole array is unusable and dead.
          (As explained in the wikipedia article: with Linux in peculiar, MD will keep waiting. It's the SCSI/SATA/etc. layer that decides to consider the drive problematic after 30s).
          Also in all RAID above 0, it doesn't make any sense to insist more than a couple of seconds to try to access data. After that, better to give up and use the redundancy to rebuild the data.

          As they don't think in terms of "whole drives" but in terms of file systems and extents, BTFS and ZFS probably won't fail the whole drive after such a long-error.
          (on the other hand, as on Linux it's the SCSI layer that is failing the drive, it might still affect BTFS if SCSI decides after the reset to mark the drive failed at the SCSI level).

          The global effect is that WD Green tend to randomly drop out of the RAID, and as most people use RAID0: these results tend to be catastrophic.

          I haven't had failures in the few desktop were I've played around with RAID on WD.
          My brother got a WD which was DOA, so he didn't even get to try RAID neither.

          I tend to go to HGST UltraStar for any more serious business.

          Comment

          Working...
          X