RAID 5/6 Continues Being Improved For Btrfs With Linux 3.20

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gilboa
    replied
    Originally posted by SystemCrasher View Post
    As far as I understand, ZFS uses more or less old boring RAID approach with more or less "fixed" striping scheme in use. Bad flexibility, long rebuild times, difficulty to reconfigure and ton of limitations are hallmarks of this approach. Same goes for "md raid" thing, btw.

    Btrfs design is light years ahead. It does not minds arbitrary mix of raid levels. It can convert, say, set of chunks in RAID1 into RAID6 if you've used RAID1 and then decided go for RAID6 instead. Technically, it meant to be able to make allocation decision in terms of striping/parity for separate subvolume or even for separate files - internal design inherently assumes there could be various chunks using various redundancy schemes. There're devices. There're free space on these devices. There is scheme user has requested for particular allocation. As long as there're enough devices and free spaces, allocation succeeds. This means if you request, say, RAID 1, it do not have to be 2 disks. It can be, say, 3 disks. Then if, say, these were 3 x 1 TiB drives, you will be able to get 3/2 = 1.5TiB of space. All chunks will have a copy on at least 2 devices, but these could be different devices and different places on these devices, unlike in classic RAIDs mapping blocks one-to-one.

    There is funny catch in this filesystem-aware-RAID design though. It is all about free space: btrfs knows how much devices are here and how much free space they have. Yet since it could be arbitrary mix of RAID levels, BTRFS can not predict what allocation schemes will be requested in future. That's where it getting interesting. Ask it to put new blocks as RAID 1 and it will be one amount of data you can fit. Ask it for RAID 0 and it will be different. Ask for RAID6 and it will be different amount which fits either. Since btrfs cant foresee what would happen in future, it makes "fair" reporting of free space tricky business. There are no longer single number to represent free space. But rather "conditional" stuff. Everything haves drawbacks and this is one of few drawbacks of futuristic and flexible btrfs design.
    Two (plus) comments:
    1. I doubt that any type of production deployment will seriously consider converting live data from RAID1 into RAID-N.
    1a. ... As such, I doubt that btrfs mixed RAID level will see much real-world use, and as a result, see much testing.
    2. Spanning RAID1 on multiple drives (RAID 1E) is supported by most new RAID controllers (and AFAIR, by MD-RAID1), this of-course, doesn't change the fact that RAID1E can only survive one drive failure.

    Leave a comment:


  • SystemCrasher
    replied
    Originally posted by sligor View Post
    That's why btrfs only improve slowly and has difficulties to compete with ZFS, btrfs has far too much ambition on its features.
    As far as I understand, ZFS uses more or less old boring RAID approach with more or less "fixed" striping scheme in use. Bad flexibility, long rebuild times, difficulty to reconfigure and ton of limitations are hallmarks of this approach. Same goes for "md raid" thing, btw.

    Btrfs design is light years ahead. It does not minds arbitrary mix of raid levels. It can convert, say, set of chunks in RAID1 into RAID6 if you've used RAID1 and then decided go for RAID6 instead. Technically, it meant to be able to make allocation decision in terms of striping/parity for separate subvolume or even for separate files - internal design inherently assumes there could be various chunks using various redundancy schemes. There're devices. There're free space on these devices. There is scheme user has requested for particular allocation. As long as there're enough devices and free spaces, allocation succeeds. This means if you request, say, RAID 1, it do not have to be 2 disks. It can be, say, 3 disks. Then if, say, these were 3 x 1 TiB drives, you will be able to get 3/2 = 1.5TiB of space. All chunks will have a copy on at least 2 devices, but these could be different devices and different places on these devices, unlike in classic RAIDs mapping blocks one-to-one.

    There is funny catch in this filesystem-aware-RAID design though. It is all about free space: btrfs knows how much devices are here and how much free space they have. Yet since it could be arbitrary mix of RAID levels, BTRFS can not predict what allocation schemes will be requested in future. That's where it getting interesting. Ask it to put new blocks as RAID 1 and it will be one amount of data you can fit. Ask it for RAID 0 and it will be different. Ask for RAID6 and it will be different amount which fits either. Since btrfs cant foresee what would happen in future, it makes "fair" reporting of free space tricky business. There are no longer single number to represent free space. But rather "conditional" stuff. Everything haves drawbacks and this is one of few drawbacks of futuristic and flexible btrfs design.

    Leave a comment:


  • gilboa
    replied
    Originally posted by JS987 View Post
    RAID10 is much faster than RAID6 in some cases like random write which means multithreaded mixed random read/write also will be faster.
    http://louwrentius.com/benchmark-res...id-levels.html
    Again, your point may be valid for home desktop w/ 4 drives or a low-end server with 8 drives.
    Building a 100 or 200TB RAID10 setup is simply not practical, let alone the fact that RAID10 is less *resilient*.

    E.g. I'm currently building a 4U 4S server w/ 24 x 6TB (or 8TB) drives.
    RAID6 (10% spares): 24 drives, 120TB (6TB, 2 parity, 2 spares). Can survive the lose of *any* two drives.
    RAID7/Z (10% spares): 24 drives, 114TB (6TB, 2 parity, 2 spares). Can survive the lose of *any* 3 drives. (ZFS, future MD/btrfs w/ triple parity, etc)
    RAID10.1 (10% spares, same disk count): 24 drives, 66TB (6TB, 2 spares). Can survive the lose of *any* one drive and it has a 50% chance of surviving the loss of a second drive.
    RAID10.2 (10% spares, same storage space): 44 (!) drives, 120TB (6TB, 4 spares). Can survive the lose of *any* one drive and it has a 50% chance of surviving the loss of a second drive.

    All in all, in-order to use RAID10, I should either write off 45% of the disk space or pay a 30-40% price premium and 100% rack-space premium (external 4S storage box).
    Neither option can survive a two drive failure.

    In short, RAID10 cannot be used in any serious high-volume deployment.

    - Gilboa

    Leave a comment:


  • JS987
    replied
    Originally posted by gilboa View Post
    Sure.
    Let see you build a 200TB RAID 10 setup, and explain to the people who are paying for it, that you are spending twice as much on hardware (actually, more than twice, depending on the actual hardware configuration) just "because you don't like RAID6".
    RAID10 is much faster than RAID6 in some cases like random write which means multithreaded mixed random read/write also will be faster.

    Leave a comment:


  • gilboa
    replied
    Originally posted by JS987 View Post
    There is also RAID 10 which work with any number of drives.
    http://en.wikipedia.org/wiki/Non-sta...nux_MD_RAID_10
    Sure.
    Let see you build a 200TB RAID 10 setup, and explain to the people who are paying for it, that you are spending twice as much on hardware (actually, more than twice, depending on the actual hardware configuration) just "because you don't like RAID6".

    Leave a comment:


  • JS987
    replied
    Originally posted by pgoetz View Post
    Sure. I have servers which require 30TB partitions. How exactly are you going to implement that using RAID 1? LVM?
    There is also RAID 10 which work with any number of drives.

    Leave a comment:


  • pgoetz
    replied
    Originally posted by pal666 View Post
    why? raid 1 is faster and more reliable. raid5 is for poor people who have not enough drive space.
    Sure. I have servers which require 30TB partitions. How exactly are you going to implement that using RAID 1? LVM?

    Leave a comment:


  • gilboa
    replied
    Originally posted by pal666 View Post
    you can't afford fast and reliable storage? then you are cheap moron
    Care to share the combined RAID space you have?
    As you are so keen to call everybody cheap morons, I'd like to know how many TB (or PB) your lordship owns / manages...

    - Gilboa

    Leave a comment:


  • gilboa
    replied
    Originally posted by pal666 View Post
    why? raid 1 is faster and more reliable. raid5 is for poor people who have not enough drive space.
    1. RAID1 is *not* faster. In many loads (E.g. large sequential read/write) large RAID5 (and 6) array will run circles around RAID1.
    E.g. On a server w/ 8 x 4TB 7200RPM drives I have two Linux MD RAIDs.
    RAID1 for OS + boot.
    RAID6 for data.
    The RAID1 large block sequential r/w usually max at ~100MB/s.
    The RAID6 large block sequential r/w usually max at ~500MB/s.

    2. RAID1 is just as *unreliable* as RAID5 if not more. Both can only survive a single disk failure. Only RAID6 (or above) can be considered semi-safe.
    3. No offense intended, but your arrogance blinds you (and your comment is very arrogant). If you need to build a 100+ or 200+ TB RAID, RAID 1 or even 10 is simply not an option. In this case, you either use RAID6 w/ a large number of spares, or use some proprietary RAID format (E.g. ZFS' RAID-Z).

    - Gilboa
    Last edited by gilboa; 24 February 2015, 08:09 AM.

    Leave a comment:


  • pal666
    replied
    Originally posted by Swiftpaw View Post
    You love wasting electricity and drive space?
    you can't afford fast and reliable storage? then you are cheap moron

    Leave a comment:

Working...
X