Announcement

Collapse
No announcement yet.

Btrfs Adds Degenerate RAID Support, Performance Improvements With Linux 5.15

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by billyswong View Post
    Would there ever someone create something RAID5/6-like filesystem/array for SSD? In my understanding the concept of RAID5/6 is to maximize the available storage volume, and keep things transparent and operational when there is one/two-drive-loss. if SSDs are put into an old style RAID1/10/5/6 array, all drives will likely wear out at the same rate and fail, in time interval, too close together for even hot spare to save the game. I don't understand why people talk as if RAID6 is outdated for SSD but RAID1/10 are still okay. From what I learnt so far they are all the same in face of the new properties of SSD.
    SSDs are not 'that' predictable in failure. If they were, smart would tell you the date/time when they will fail..
    I use raid 1 or 10 with SSDs because it provides uptime if something goes wrong with a drive but replace them below a certain "health" status or TBW.
    Raid 5/6 is problematic because you put a huge load on the degraded array after disk failure. It has been shown that this can serve as a silver bullet (or catalyst if you will) for a second drive failure. This is true for SSD and large HDD.

    Comment


    • #22
      Originally posted by mppix View Post

      SSDs are not 'that' predictable in failure. If they were, smart would tell you the date/time when they will fail..
      I use raid 1 or 10 with SSDs because it provides uptime if something goes wrong with a drive but replace them below a certain "health" status or TBW.
      Raid 5/6 is problematic because you put a huge load on the degraded array after disk failure. It has been shown that this can serve as a silver bullet (or catalyst if you will) for a second drive failure. This is true for SSD and large HDD.
      But doesn't SSD wears out by writing and okay for reading? At least that's what I thought. The "huge load" you mentioned seems the same between RAID10 and RAID5/6 if my math is correct. While RAID10 only asks for a total read through of one corresponding drive when rebuilding, and RAID5/6 asks for a total read through of all other drives, the actual stress is the same for any particular drive(s) involved. I haven't complete the calculation but it is hard for me to believe RAID6 to be significantly more dangerous than RAID10 for SSD. If the failure trigger chance of pure read is close to 0 for SSD, RAID6 shall be safer than RAID10 for SSD. (For high drive count HDD array, one may go for RAID60 + hot spare..)

      Some Googling give me a paper on Differential RAID https://www.microsoft.com/en-us/rese...d-reliability/ but there doesn't seem any mainstream implementation in production.

      Comment


      • #23
        Originally posted by xfcemint View Post
        I think that the "degenerate RAID" feature is very usefull for home users of RAID1 arrays (small and medium desktops), where it assists in replacing the failed disk drives (where it is assumed that a user needs to go buy a new drive on drive failure). Unfortunately, the btrfs developers seem to have concentrated their effors on RAID0 arrays first.

        AFAIK, on RAID1 btrfs arrays, the "degenerete" mode still allows only read-only access. This is unlike most hardware RAID controllers and unlike mdadm software RAID. So, when a drive in btrfs raid1 fails, then a user needs to run to the nearest hardware store to buy a replacement (OS cannot function well in read-only mode).
        That's degraded mode, not degenerate. Different thing.

        Comment


        • #24
          Originally posted by billyswong View Post

          But doesn't SSD wears out by writing and okay for reading? At least that's what I thought. The "huge load" you mentioned seems the same between RAID10 and RAID5/6 if my math is correct. While RAID10 only asks for a total read through of one corresponding drive when rebuilding, and RAID5/6 asks for a total read through of all other drives, the actual stress is the same for any particular drive(s) involved. I haven't complete the calculation but it is hard for me to believe RAID6 to be significantly more dangerous than RAID10 for SSD. If the failure trigger chance of pure read is close to 0 for SSD, RAID6 shall be safer than RAID10 for SSD. (For high drive count HDD array, one may go for RAID60 + hot spare..)

          Some Googling give me a paper on Differential RAID https://www.microsoft.com/en-us/rese...d-reliability/ but there doesn't seem any mainstream implementation in production.
          We don't have to rehash the entire "is raid5 really bad?" conversation. There is enough information out there. You can, but I don't know any company that recommends raid5 anymore (more often than not issuing warnings)
          https://www.reddit.com/r/sysadmin/co...ended_for_any/

          Also for SSD specifically, if you have pcie 4 (and future pcie 5) arrays, you can get reasonably close to the memory bandwidth. Then software raid, especially the raid 5/6 variety, can become a bottleneck (and there is not really something like pcie hardware raid).
          Last edited by mppix; 01 September 2021, 03:31 PM.

          Comment


          • #25
            Originally posted by mppix View Post

            We don't have to rehash the entire "is raid5 really bad?" conversation. There is enough information out there. You can, but I don't know any company that recommends raid5 anymore (more often than not issuing warnings)
            https://www.reddit.com/r/sysadmin/co...ended_for_any/
            Thats because the premise is misleading. Since hard drives (at least until the whole Chia thing) were getting so cheap, you were typically better off doing RAID10 since it also has better performance.

            However if you are optimizing for hard drive space at the cost of some performance (i.e. you are doing some sought of a cold storage solution) then zraid1/2/3 is still superior (of course depending on vdev size).

            You can go to the truenas/ixsystems forums and there have been people running these setups for over a decade without any problems.

            Also a lot of the information about "recommending RAID5/6" is quite outdated and hard drives today are different to what they were back then. Furthermore if you are using systems like ARC/Level2 arc this reduces the wear and tear on the hard drives (sometimes quite significantly if properly optimized).

            Evidently you didn't read the link you posted earlier, because its not a clear cut no. As was stated there, if you are using SAS (rather than SATA) and you have drives with good URE, its a very different story.

            Originally posted by mppix View Post
            Also for SSD specifically, if you have pcie 4 (and future pcie 5) arrays, you can get reasonably close to the memory bandwidth. Then software raid, especially the raid 5/6 variety, can become a bottleneck (and there is not really something like pcie hardware raid).
            RAID5/6 is not for SSD's because you are getting the worst of both worlds. The point of RAID 5/6 is to optimize for storage while at least having some redundancy. If you are already using SSD's then the implication is that you are already not optimizing for storage space but something else.

            P.S. if you have issues with cascading SSD's (or hard drives) then mix your batches, i.e. don't put all disks from the same batch into a same system. Ideally you would mix and match batches from different companies (assuming they fulfill your specifications) to prevent these issues.
            Last edited by mdedetrich; 01 September 2021, 07:12 PM.

            Comment


            • #26
              Originally posted by mdedetrich View Post
              Thats because the premise is misleading. Since hard drives (at least until the whole Chia thing) were getting so cheap, you were typically better off doing RAID10 since it also has better performance.

              However if you are optimizing for hard drive space at the cost of some performance (i.e. you are doing some sought of a cold storage solution) then zraid1/2/3 is still superior (of course depending on vdev size).

              You can go to the truenas/ixsystems forums and there have been people running these setups for over a decade without any problems.

              Also a lot of the information about "recommending RAID5/6" is quite outdated and hard drives today are different to what they were back then. Furthermore if you are using systems like ARC/Level2 arc this reduces the wear and tear on the hard drives (sometimes quite significantly if properly optimized).

              Evidently you didn't read the link you posted earlier, because its not a clear cut no. As was stated there, if you are using SAS (rather than SATA) and you have drives with good URE, its a very different story.
              I think we are saying similars thing with different arguments. I am looking at it from the angle that RAID is for high(er) availability, i.e. RAID is not a backup.
              I read the link of course and I liked it _because_ it includes the discussion (you can find many others).
              The point is RAID 5 (and increasingly 6) is slowly falling out of favor in the server domain and I don't know of a storage vendor that would describe it as a technology for the future (or even recommend it for a new installation).
              For home users, I am also not sure IF raid 5/6 brings much to the table nowadays. You need a backup anyway. If you have a backup, say in a cloud, it can take less time to download the backup than resilver an array (but you have no access to data vs. slow access to data).
              I think everyone will need to make up their mind if RAID 5/6 is the correct solution for them.

              Originally posted by mdedetrich View Post
              RAID5/6 is not for SSD's because you are getting the worst of both worlds. The point of RAID 5/6 is to optimize for storage while at least having some redundancy. If you are already using SSD's then the implication is that you are already not optimizing for storage space but something else.

              P.S. if you have issues with cascading SSD's (or hard drives) then mix your batches, i.e. don't put all disks from the same batch into a same system. Ideally you would mix and match batches from different companies (assuming they fulfill your specifications) to prevent these issues.
              I agree with this.

              Comment


              • #27
                Originally posted by mppix View Post
                I think we are saying similars thing with different arguments. I am looking at it from the angle that RAID is for high(er) availability, i.e. RAID is not a backup.
                I read the link of course and I liked it _because_ it includes the discussion (you can find many others).
                The point is RAID 5 (and increasingly 6) is slowly falling out of favor in the server domain and I don't know of a storage vendor that would describe it as a technology for the future (or even recommend it for a new installation).
                For home users, I am also not sure IF raid 5/6 brings much to the table nowadays. You need a backup anyway. If you have a backup, say in a cloud, it can take less time to download the backup than resilver an array (but you have no access to data vs. slow access to data).
                I think everyone will need to make up their mind if RAID 5/6 is the correct solution for them.
                Well I will just finish off with these points
                • If RAID 5/6 was pointless in enterprise server storage then ZFS wouldn't have even bothered with it (and by far the biggest demographic for ZFS is high class enterprise storage)
                • I would only ever advocate RAID 5/6 when using ZFS (they called it zraid1/2/3). Its the only RAID 5/6 implementation that solves the write hole. With ZRaid1/2/3 you will not have these problems, especially if you use an LSI HBA (which you should have in general for software raid solutions including BTRFS).
                • Due to how complicated it is to implement RAID 5/6 (in addition to the previous point about the write hole) its not surprising that certain vendors such as Dell advice against it, because tbh most of these machines use hardware raid HBA's which historically have been terrible when it comes to RAID 5/6 implementation (which I can understand where all of the pain comes from). This is compounded by the fact that hardware raid ties the hard drives to the HBA controller.
                As a home user I would actually argue that ZFS zraid1/2/3 is perfect moreso then enterprise deployments which tend to hyper specialize, typically home users are more budget conscious and usually you have limited space for the typical home user NAS setups that you have. If you buy/build a 6 bay NAS, losing half of that for redundancy is massively overkill for home users, Z-RAID1 is perfect for this usecase.

                Its really a shame that BTRFS didn't solve this issue and its unlikely they will solve it because it requires a change to the on disk format. In all honesty its not surprising though, contrary to ZFS which was meticulously planned and designed to be correct before it was released (and it shows, at the time ZFS was released it was like a future technology from aliens) BTRFS was rushed because of "reasons" (Linux needed CoW really bad? Losing market share/competition with ZFS/OpenZFS?) .

                I would even say that if BTRFS doesn't end solving the RAID 5/6 write hole they should just remove it as a RAID option, although from what I have heard it now displays a big red warning which is better than nothing.

                Comment


                • #28
                  Originally posted by mdedetrich View Post
                  Well I will just finish off with these points
                  • If RAID 5/6 was pointless in enterprise server storage then ZFS wouldn't have even bothered with it (and by far the biggest demographic for ZFS is high class enterprise storage)
                  • I would only ever advocate RAID 5/6 when using ZFS (they called it zraid1/2/3). Its the only RAID 5/6 implementation that solves the write hole. With ZRaid1/2/3 you will not have these problems, especially if you use an LSI HBA (which you should have in general for software raid solutions including BTRFS).
                  • Due to how complicated it is to implement RAID 5/6 (in addition to the previous point about the write hole) its not surprising that certain vendors such as Dell advice against it, because tbh most of these machines use hardware raid HBA's which historically have been terrible when it comes to RAID 5/6 implementation (which I can understand where all of the pain comes from). This is compounded by the fact that hardware raid ties the hard drives to the HBA controller.
                  I would argue that this thread is not about ZFS.
                  However, for context: ZFS is old and was designed in the HDD era when raid 5/6 was more relevant. Today, if you load up an AMD EPYC with 24 nvme drives, ZFS itself is the bottleneck.
                  ZFS is also _by_far_ not the only enterprise storage solution, especially for bulk storage that goes beyond one server, when we start talking about scale-out network filesystems.

                  Originally posted by mdedetrich View Post
                  As a home user I would actually argue that ZFS zraid1/2/3 is perfect moreso then enterprise deployments which tend to hyper specialize, typically home users are more budget conscious and usually you have limited space for the typical home user NAS setups that you have. If you buy/build a 6 bay NAS, losing half of that for redundancy is massively overkill for home users, Z-RAID1 is perfect for this usecase.
                  I believe the most common home-NAS are 2 or 4 bay, where raid 5/6 does not really make sense.
                  For 6 and 8 bay home-NAS, it is a bit of a different story. You may prefer size. However, my take would be that with the disk sizes in 2021, you may prefer raid 10, especially for SATA HDD. Then, you have at least a chance to saturate a 1GBe line (considering also the mediocre computation power of today's home-NAS).
                  Also, I don't know if ZFS is that common for home-NAS with Synology (primarily) using BTRFS and qnap ext4.

                  Originally posted by mdedetrich View Post
                  Its really a shame that BTRFS didn't solve this issue and its unlikely they will solve it because it requires a change to the on disk format. In all honesty its not surprising though, contrary to ZFS which was meticulously planned and designed to be correct before it was released (and it shows, at the time ZFS was released it was like a future technology from aliens) BTRFS was rushed because of "reasons" (Linux needed CoW really bad? Losing market share/competition with ZFS/OpenZFS?) .
                  ZFS started as a product by a large company.
                  BTRFS is a open-source project with free contributions. Crowd sourcing a project implies less "direction" and development is done publicly for everyone to see.
                  I don't think Linux "desperately needs" either because you can get largely the same functionality with a "mdadm+LVM+ext4/xfs" stack that tends to outperform both of them.

                  Originally posted by mdedetrich View Post
                  I would even say that if BTRFS doesn't end solving the RAID 5/6 write hole they should just remove it as a RAID option, although from what I have heard it now displays a big red warning which is better than nothing.
                  I agree with this. I assume there is some hope that someone steps up solving it but it does not look like there is enough interest.
                  Last edited by mppix; 02 September 2021, 04:56 PM.

                  Comment


                  • #29
                    I heard from developers only only yesterday that they're going to work on the raid56 issue start current set of zoned storage patches. I really hope it's true

                    Comment


                    • #30
                      Originally posted by mppix View Post
                      I would argue that this thread is not about ZFS.
                      However, for context: ZFS is old and was designed in the HDD era when raid 5/6 was more relevant. Today, if you load up an AMD EPYC with 24 nvme drives, ZFS itself is the bottleneck.
                      ZFS is also _by_far_ not the only enterprise storage solution, especially for bulk storage that goes beyond one server, when we start talking about scale-out network filesystems.
                      Not sure about the bottleneck bit, but you are right it was designed for HDD era.

                      Originally posted by mppix View Post
                      I believe the most common home-NAS are 2 or 4 bay, where raid 5/6 does not really make sense.
                      For 6 and 8 bay home-NAS, it is a bit of a different story. You may prefer size. However, my take would be that with the disk sizes in 2021, you may prefer raid 10, especially for SATA HDD. Then, you have at least a chance to saturate a 1GBe line (considering also the mediocre computation power of today's home-NAS).
                      Also, I don't know if ZFS is that common for home-NAS with Synology (primarily) using BTRFS and qnap ext4.
                      TBH if you have this 6-8 bay setup, RAID 5 (or zraid1) with an nvme SSD 1/2tb setup as Arc2 along with 32 gigs of RAM is going to give you much better performance, hard drive space and reliability compared to a RAID 10 setup.

                      With such a low pool size, you are not going to get that much faster read speeds from a RAID 10 setup (if we are talking hard drives with random reads) compared to zraid1, you are just better off using an SSD cache (ergo level2 arc) which is going to be fine for a typical home media setup.

                      Regarding home setup, TrueNAS core has gone a long way if you have commodity hardware otherwise you have stuff like https://www.truenas.com/truenas-mini/

                      Originally posted by mppix View Post
                      ZFS started as a product by a large company.
                      BTRFS is a open-source project with free contributions. Crowd sourcing a project implies less "direction" and development is done publicly for everyone to see.
                      I don't think Linux "desperately needs" either because you can get largely the same functionality with a "mdadm+LVM+ext4/xfs" stack that tends to outperform both of them.
                      In context by desperate I meant I don't know why they rushed RAID5/6 so fast in BTRFS knowing that it was broken and you are dealing with a *filesystem* which is one of the few things you *dont* want to fail, especially a file-system that is designed for resiliency. If you are going to design a filesystem that by design its doing its best to make sure your data is safe then you should only push it into Linux tree when its achieved that goal.

                      tl;dr filesystems like this should never be rushed, if you are using something like ext2-4 then you can maybe expect some data less in extreme circumstances but btrfs was blatantly advertised to be linux's solution for ZFS, in other words its not meant to be released with the glaring issues its historically had.
                      Last edited by mdedetrich; 03 September 2021, 06:09 AM.

                      Comment

                      Working...
                      X