Announcement

Collapse
No announcement yet.

Btrfs Will Finally "Strongly Discourage" You When Creating RAID5 / RAID6 Arrays

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by waxhead View Post
    You should try RAID1 on two devices once and try to inject random data in one of the devices and see how btrfs fixed it. You might be surprised about how well it works.
    I'm not the person you're responding to but:

    I was dealing with one WD Red spitting out UNCs in a RAID10 btrfs setup. It seems the firmware does not immediately try to relocate them, so I just overwrote the offending LBA addresses. After a successful SMART Self-Test, a btrfs scrub corrected the files touched by the sectors I had overwritten.

    Before I figured out what was happening (I need to stop procrastinating and set up an alert mail server in this box), btrfs would just restore the files from mirror when encountering an IO error. Hassle free.

    Comment


    • #32
      Originally posted by waxhead View Post

      I think that the "problems" you are mentioning are about files marked as no copy on write. I follow the mailing list too and can not remember anything like you describe. Running BTRFS in single mode is only useful for detecting errors. You should try RAID1 on two devices once and try to inject random data in one of the devices and see how btrfs fixes it. You might be surprised about how well it works.
      If only more motherboards would have more than one m.2 slots, that is the only reason why I run BTRFS in single mode on my home machine. That said there is some bug in either btrfs, udev or the initramfs script in atleast Ubuntu 20.04 because on new servers that I've set up with BTRFS in RAID1 on the boot drive they very often end up in the initramfs console during boot due to not being able to find a root device.

      dmesg tells that btrfs wines that it cannot find the UUID_SUB of the other drive. So far I "fixed" it by adding this to "/usr/share/initramfs-tools/scripts/local"

      Code:
      local_mount_root()
      {
              ...
      
      [B]        /usr/bin/sleep 5s
             modprobe btrfs
             /bin/btrfs device scan[/B]
      
             # Mount root
             # shellcheck disable=SC2086
             mount ${roflag} ${FSTYPE:+-t "${FSTYPE}"} ${ROOTFLAGS} "${ROOT}" "${rootmnt?}"
      So that we allow 5 seconds for both drives to spin up before we scan for devices. Googling around for the same error told me that other people had experienced the same problem due to "/usr/lib/udev/rules.d/64-btrfs.rules" being missing which tells udev to wait for all drives to spin up before BTRFS is run but that didn't help it for me (and I did check that this udev rule was added to the initramfs).

      Never ever had this problem when using BTRFS in RAID1 for partitions mounted after boot, only when used as the root partition.

      Comment


      • #33
        Originally posted by flower View Post

        those bugs are btrfs specific. so any hw raid controller is unaffected.
        but if you dont use any raid level with btrfs you loose its bitrot correction abilities.

        hardware raid controllers are not a thing any more. software raid (zfs / mdadm - not btrfs) has way more benefits. just use a ups though
        Benefits like?
        I have 4 SSDs set up with mdadm RAID10 and after hours of tuning and testing, I am still not impressed with IOPS and 99% latency is significantly higher than single disk.
        BTRFS is right in generally recommending raid 1 (or raid 1c2, c3, ..)

        Comment


        • #34
          Originally posted by kiffmet View Post
          Is BTRFS Raid 5/6 even fixable without changing the specification? All I know is that RAID has been an issue for at least 5 years now.
          Fixable yes but there is apparently limited corporate interest in doing it. Both Facebook and Synology are heavy BTRFS users.

          Comment


          • #35
            Originally posted by F.Ultra View Post

            If only more motherboards would have more than one m.2 slots, that is the only reason why I run BTRFS in single mode on my home machine. That said there is some bug in either btrfs, udev or the initramfs script in atleast Ubuntu 20.04 because on new servers that I've set up with BTRFS in RAID1 on the boot drive they very often end up in the initramfs console during boot due to not being able to find a root device.

            dmesg tells that btrfs wines that it cannot find the UUID_SUB of the other drive. So far I "fixed" it by adding this to "/usr/share/initramfs-tools/scripts/local"

            Code:
            local_mount_root()
            {
            ...
            
            [B] /usr/bin/sleep 5s
            modprobe btrfs
            /bin/btrfs device scan[/B]
            
            # Mount root
            # shellcheck disable=SC2086
            mount ${roflag} ${FSTYPE:+-t "${FSTYPE}"} ${ROOTFLAGS} "${ROOT}" "${rootmnt?}"
            So that we allow 5 seconds for both drives to spin up before we scan for devices. Googling around for the same error told me that other people had experienced the same problem due to "/usr/lib/udev/rules.d/64-btrfs.rules" being missing which tells udev to wait for all drives to spin up before BTRFS is run but that didn't help it for me (and I did check that this udev rule was added to the initramfs).

            Never ever had this problem when using BTRFS in RAID1 for partitions mounted after boot, only when used as the root partition.
            Well... I doubt this is a problem in the sense that it is a bug. I know that there has been some shitshow on the BTRFS mailing list with udev/systemd/btrfs where they argue that systemd should wait for something while the systemd guys argue that BTRFS is not returning a proper state... or something like that.
            I have run BTRFS as root on Debian for years and my only problem is GRUB doing apparently random things without any explanation as it only fails during a grub-update. The fix is usually to chroot and run grub-update once again or reinstall + update grub. And before someone asks, yes I got BTRFS on a partition that leaves a nice gap for grub to live.

            I have not played around with Ubuntu for years , but every time I tried it something broke horribly. Debian on the other hand works flawlessly , the only exception is this grub nonsense.

            http://www.dirtcellar.net

            Comment


            • #36
              I remember a time where insecure code in file systems would have been removed or at the very least been disabled. We must be living in liberal times ...

              Comment


              • #37
                Originally posted by sdack View Post
                I remember a time where insecure code in file systems would have been removed or at the very least been disabled. We must be living in liberal times ...
                Do you mean the btrfs RAID-5 code? It isn't "insecure." It's possibly data lossy and buggy.

                And no, this kind of thing has never just been removed from the kernel. XFS had data loss bugs for years. It depended on the settings used, and was mostly the user's fault for not using fsync properly. MD RAID had write hole bugs for years as well. I once had to run fsck for hours because of MD RAID-5 just losing a piece of the filesystem.

                They were eventually fixed, not removed from the kernel.

                Comment


                • #38
                  Originally posted by Space Heater View Post
                  I think it's hard to say because none of the companies that actually employ btrfs developers seem to be interested in improving the raid situation.
                  The hyperscalers (and their ilk) generally do not try to repair a broken server/filesystem, they blow the instance away and reinstall/rebuild from scratch, as copies of the data exist in other domains/zones/physical_locations to allow a concurrent rebuild. Only when a specific server (or array) repeatedly fails is it taken out of service and left to some future physical replacement (which may involve waiting for the rack itself to be wheeled out of the DC).

                  Comment


                  • #39
                    Originally posted by vladpetric View Post
                    Honesty is good!

                    ...
                    Could as well warn the user to avoid using the file system at all.

                    Comment


                    • #40
                      Originally posted by flower View Post

                      raid with ssds is always problematic as normal (and zfs) raid implementations tend to write the same amounts of data to every device. if you start a new array with the same new devices they will fail together.

                      thats not true for snapraid or - to some extent - btrfs.
                      Snapraid is a little known gem of a program. Tremendously flexible when you want some resilient bulk storage. Combined with something for pooling like mergefs or DrivePool (Windows) and it's hard to beat for a media server.

                      Comment

                      Working...
                      X