Announcement

Collapse
No announcement yet.

Making Use Of Btrfs 3-Copy/4-Copy Support For RAID1 With Linux 5.5+

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making Use Of Btrfs 3-Copy/4-Copy Support For RAID1 With Linux 5.5+

    Phoronix: Making Use Of Btrfs 3-Copy/4-Copy Support For RAID1 With Linux 5.5+

    With the recently released Linux 5.5 and its new features, one of the prominent changes on the storage front was the Btrfs file-system picking up new "RAID1C3" and "RAID1C4" modes for allowing either three or four copies of RAID1 data across more drives to potentially allow up to three of four drives to fail in a RAID1 array while still being able to recover that data for this file-system with its native RAID capabilities...

    http://www.phoronix.com/scan.php?pag...AID1C3-RAID1C4

  • #2
    How does the feature work with with differing device storage capacities? Are you capped to the lowest device capacity? Can that be worked around if you have an additional device to bring storage capacity to meet the higher device capacities?

    eg: 1TB, 1TB, 500GB, 500GB

    RAID1C3, should be able to handle 1TB with 3 copies with 4 devices like that? Whereas with only 1 500GB, only 500GB is available?(putting the extra 500GB of the two 1TB devices to waste if allocated to BTRFS?)

    Comment


    • #3
      It'll be nice to see the better load balancing in action when that's done, the current balancing scheme doesn't really work as well as it could when you have many long-running tasks that all use heavy disk access.
      Though I guess it wouldn't really matter for the average user, as it seems to balance short-lived tasks just fine.

      Comment


      • #4
        Originally posted by polarathene View Post
        How does the feature work with with differing device storage capacities? Are you capped to the lowest device capacity? Can that be worked around if you have an additional device to bring storage capacity to meet the higher device capacities?

        eg: 1TB, 1TB, 500GB, 500GB

        RAID1C3, should be able to handle 1TB with 3 copies with 4 devices like that? Whereas with only 1 500GB, only 500GB is available?(putting the extra 500GB of the two 1TB devices to waste if allocated to BTRFS?)
        That works with btrfs. It'll keep allocating "blocks" of say 1GB (size configurable IIRC), each mapped to the N disks with the most free space at the time. So your scenario will work until there aren't enough disks empty enough for a new allocation.

        IMO this is btrfs's greatest advantage over ZFS and block device based RAID.

        Comment


        • #5
          What happens currently when one device fails in RAID1?

          Comment


          • #6
            Originally posted by scineram View Post
            What happens currently when one device fails in RAID1?
            From the article:

            Following with testing, I tried simple mkfs and conversions, that worked well. Then scrub, overwrite some blocks and let the auto-repair do the work. No hiccups. The remaining and important part was the device replace, as the expected use case was to substitute RAID6, replacing a missing or damaged disk. I wrote the test script, replace 1 missing, replace 2 missing. And it did not work. While the filesystem was mounted, everything seemed OK. Unmount, check again and the devices were still missing. Not cool, right. Due to lack of time before the upcoming merge window (a code freeze before next development cycle), I had to declare it not ready and put it aside. This was in late 2018. For a highly requested feature this was not an easy decision. Should it be something less important, the development cycle between rc1 and final release provides enough time to fix things up. But due to the maintainer role with its demands I was not confident that I could find enough time to debug and fix the remaining problem. Also nobody offered help to continue the work, but that’s how it goes. At the late 2019 I had some spare time and looked at the pending work again. Enhanced the test script with more debugging messages and more checks. The code worked well, the test script was subtly broken. Oh well, what a blunder. That cost a year, but on the other hand releasing a highly requested feature that lacks an important part was not an appealing option. The patchset was added to 5.5 development queue at about the last time before freeze, final 5.5 release happened a week ago.

            Comment


            • #7
              Originally posted by scineram View Post
              What happens currently when one device fails in RAID1?
              I think it is very important to remember that what BTRFS call "RAID" is not really RAID in the traditional sense where you have a mirror of the entire disk. In BTRFS terms RAID1 really is RAID1c2 means one replica of the data or two copies. The new modes RAID1c3 and RAID1c4 means two or tree replicas - in total 3 or 4 copies of your data. They way BTRFS does work is in principle just like skandalfo said - as it needs more space it allocates a chunk or "mini partition" if you like on your disk. in case of for example "RAID1" it allocates two chunks on two different devices and make sure they contain the same data.

              With RAID1 you will loose one replica. With BTRFS "RAID1" you *may* loose on replica depending on the number of disks you have in your "RAID1" pool and if BTRFS have allocated anything on that disk.

              So to answer your question. What happens? Well one device fails in RAID1 it fails and you still have your data available on another disk. You will know if the data is bad or corrupt since it is still checksummed. You more likely than not need to mount with the 'degraded' mount option to either rebalance (if you have enough disks / space) to restore redundancy or add a disk to you pool to increase available space before doing it. In older kernels there was a bug/weakness that could make a BTRFS RAID1 pool with only two disks go into irreversible read-only mode if you mounted it once and did not add another disk to it.

              With the new modes you should be able to in theory loose two or tree disks depending if it's RAID1c3 or RAID1c4 before you have to change your underwear.
              This is great for metadata which does not usually consume that much space. It would be even greater if BTRFS had per subvolume "RAID" levels implemented, but I suspect that will be getting a higher priority now that RAID1c3 and RAID1c4 is in place.

              Now the biggest benefit with the new modes is that parallelized reads should be much faster!

              http://www.dirtcellar.net

              Comment


              • #8
                Originally posted by polarathene View Post
                How does the feature work with with differing device storage capacities? Are you capped to the lowest device capacity? Can that be worked around if you have an additional device to bring storage capacity to meet the higher device capacities?

                eg: 1TB, 1TB, 500GB, 500GB

                RAID1C3, should be able to handle 1TB with 3 copies with 4 devices like that? Whereas with only 1 500GB, only 500GB is available?(putting the extra 500GB of the two 1TB devices to waste if allocated to BTRFS?)
                You should check out https://carfax.org.uk/btrfs-usage/

                Code:
                [1TB         ]  
                [1TB ] [500GB][500GB]
                Last edited by Falcon1; 02-11-2020, 05:30 AM.

                Comment


                • #9
                  Originally posted by skandalfo View Post
                  So your scenario will work until there aren't enough disks empty enough for a new allocation.
                  It should be fine with the 4 device setup and RAID1C3 (2x 1TB, 2x 500GB), as 3 copies across N devices is possible for the 1TB storage made available.

                  If one of those 500GB drives was dropped, then my understanding was the 1TB's are treated as 500GB as well. That is because 3 copies to separate devices is no longer feasible after 500GB is filled. No one device is permitted to carry an extra copy is it?

                  I believe RAID1 did allow for such with metadata by default on HDDs?(2 copies on single device) But with SSDs it's not the case and RAID1 isn't used for metadata as SSDs are known to internally dedupe their blocks as an optimization, so data going bad for either copy on that same SSD would make them both corrupted(supposedly, not easy to know as a consumer which SSDs dedupe internally?).

                  Once the amount of storage devices increase, and 3 copies can be distributed across them, then the additional storage can be leveraged afaik.

                  eg 8TB, 4TB, 4TB, 2TB, 2TB, 2TB, 1TB, 1TB == 8TB RAID1C3 (x3 copies needs to provide 24TB if you want to utilize the full capacity of the 8TB disk, 2 copies on other disks)

                  1TB, 1TB, 1TB, 500GB == 1.166TB RAID1C3 - I guess in this setup, the 500GB disk can hold 3 different 166GB copies of data from each disk to substitute a 1TB disk for that portion of data, which frees up that 166GB, giving the extra capacity but still keeping 3 copies spread across different devices/disks.
                  Last edited by polarathene; 02-11-2020, 06:14 AM.

                  Comment


                  • #10
                    Originally posted by Falcon1 View Post

                    You should check out https://carfax.org.uk/btrfs-usage/

                    Code:
                    [1TB ] 
                    [1TB ] [500GB][500GB]
                    Oh, cool tool, thanks for that link

                    So 1TB in that case, and if the initial disk setup was only 2x 1TB and 1x 500GB, then the two 1TB disks would lose half of their available storage capacity right?(only 500GB available in RAID1C3) At least until more capacity/devices is added to the device pool.

                    Comment

                    Working...
                    X