Announcement

Collapse
No announcement yet.

OpenZFS Lands Exciting RAIDZ Expansion Feature

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenZFS Lands Exciting RAIDZ Expansion Feature

    Phoronix: OpenZFS Lands Exciting RAIDZ Expansion Feature

    In addition to the OpenZFS code this week landing sync parallelism to improve write performance scalability, another shiny new feature was also merged: RAIDZ expansion...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Ok, now ZFS is a actually a contender even for my home NAS array(s)...

    The need to temporarily have double the storage to add just one drive was the dealbreaker.

    The second on the list is RAIDz using the smallest drive as max size of other drives.

    Comment


    • #3
      nice, just when my favorite Linux distribution finally added OpenZFS support some days ago! https://www.youtube.com/watch?v=boWzHXNZlCs

      Comment


      • #4
        Originally posted by Serafean View Post
        Ok, now ZFS is a actually a contender even for my home NAS array(s)... The need to temporarily have double the storage to add just one drive was the dealbreaker. The second on the list is RAIDz using the smallest drive as max size of other drives.
        Erasure coding is the solution
        What would be the possilibyt of doing this within ZFS? http://www.networkcomputing.com/deduplication/229500204?pgno=2 If the data is erasure-coded into N shares distributed across at least H distin...


        Reed-Solomon, Erasure Resilient Systematic Code, etc.

        Comment


        • #5
          Here's one thing I can't figure out, and I may have missed it in the comments somewhere: Say a person adds an expansion disk, feature@raidz_expansion gets set, and everything is all well and good. Later on down the line, how is the expansion disk able to be safely removed?

          Do enough disks have to be added to make the expansion disk unnecessary by increasing the pool's size via increasing the raidz level or can the starting disks be upgraded in size and the expansion removed?

          If it's detached or the expansion disk fails before being safely removed, what happens to the extra data?

          And if it is removed, does feature@raidz_expansion get unset and can older ZFS versions mount the file system again?

          Comment


          • #6
            Originally posted by Serafean View Post
            Ok, now ZFS is a actually a contender even for my home NAS array(s)...

            The need to temporarily have double the storage to add just one drive was the dealbreaker.

            The second on the list is RAIDz using the smallest drive as max size of other drives.
            It's not optimal, and I'd only do this with SSDs or faster with a long-term strategy planned out, but for the short-term you can make use of partitions being the same size of your smallest disk and sometime down the line use zpool replace to swap your small disk and partitions with larger disks.

            Comment


            • #7
              Originally posted by skeevy420 View Post
              Here's one thing I can't figure out, and I may have missed it in the comments somewhere: Say a person adds an expansion disk, feature@raidz_expansion gets set, and everything is all well and good. Later on down the line, how is the expansion disk able to be safely removed?

              Do enough disks have to be added to make the expansion disk unnecessary by increasing the pool's size via increasing the raidz level or can the starting disks be upgraded in size and the expansion removed?

              If it's detached or the expansion disk fails before being safely removed, what happens to the extra data?

              And if it is removed, does feature@raidz_expansion get unset and can older ZFS versions mount the file system again?
              Having read the merge request and discussions, this is a one way deal. Example:
              You have a 5 disk raidz-1 vdev (4 data + 1 parity).
              You want to add another disk, making this a 6 disk raidz-1 vdev (5 data + 1 parity) using this feature.
              If you perform this action your vdev is forever a 6 disk raidz-1 vdev (5 data + 1 parity). You cannot go back to a 5 disk raidz-1 vdev (4 data + 1 parity). You also cannot revert to an older version of OpenZFS. The vdev has a permanent structure change and flag that earlier versions will not understand.
              According to the merge request comments, if there is a failure of a disk during expansion the process suspends gracefully. You must replace the faulty disk, resilver, and then the expansion process can continue.

              I hope that was helpful.

              Comment


              • #8
                Originally posted by skeevy420 View Post
                Here's one thing I can't figure out, and I may have missed it in the comments somewhere: Say a person adds an expansion disk, feature@raidz_expansion gets set, and everything is all well and good. Later on down the line, how is the expansion disk able to be safely removed?

                Do enough disks have to be added to make the expansion disk unnecessary by increasing the pool's size via increasing the raidz level or can the starting disks be upgraded in size and the expansion removed?

                If it's detached or the expansion disk fails before being safely removed, what happens to the extra data?

                And if it is removed, does feature@raidz_expansion get unset and can older ZFS versions mount the file system again?
                You cannot remove a disk that has been expanded to, any more than you could remove a disk before. If a disk fails while it's being expanded into, then you need to replace that disk. The disk's contents will then be reconstructed (just like any failed disk) from the rest of the raidz up to the point it left off at, and then expansion will continue.

                ZFS does allow you to remove a whole vdev meaning you can pull out a whole raidz array, if there's enough room elsewhere on other vdevs in the pool to move the allocated data to. I'm not 100% sure whether or not this has some small performance impact later for accessing that data, at least until the blocks have been rewritten. It's been a hot minute since I last read about it and there has been work in this area over the years.

                EDIT: according to [1] no, it turns out you can't remove data-carrying vdevs from a pool that contains at least one raidz vdev. correction post below

                You can't unset the expansion flag on a raidz because you can't remove the expanded disk, but in theory if you remove the expanded vdev from the pool then it might go back to a form that an older version of ZFS can read. Can't set a flag on something that doesn't exist anymore. It depends on the implementation though and would be a good question for the developers.

                [1] https://manpages.ubuntu.com/manpages...-remove.8.html
                Last edited by Developer12; 09 November 2023, 08:08 PM.

                Comment


                • #9
                  steve007 Developer12

                  What y'all said is basically what I was assuming, that this a permanent change with no way to revert outside of creating a new pool and moving your data there. Double-edged sword of a feature because of that. Great if you're limited on ports, but it seems like you can expand yourself into a corner if you don't use it wisely.

                  Comment


                  • #10
                    Originally posted by timofonic View Post

                    Erasure coding is the solution
                    What would be the possilibyt of doing this within ZFS? http://www.networkcomputing.com/deduplication/229500204?pgno=2 If the data is erasure-coded into N shares distributed across at least H distin...


                    Reed-Solomon, Erasure Resilient Systematic Code, etc.
                    Very interesting. I wasn't aware someone had ever suggested adding another vdev type (in addition to mirror and raidz) based on erasure coding.

                    Looks like the work that issue was closed in favour of eventually become ZFS' (now shipped) distributed raid implementation? dunno if it's as fine-grain configurable as something like CEPH's erasure coding, but it definitely blurs the line into something that looks pretty similar.

                    Comment

                    Working...
                    X