Announcement

**JS987** · 24 February 2015, 05:39 PM

Originally posted by pgoetz View Post

Sure. I have servers which require 30TB partitions. How exactly are you going to implement that using RAID 1? LVM?

There is also RAID 10 which work with any number of drives.

Non-standard RAID levels - Wikipedia

http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

**gilboa** · 25 February 2015, 03:41 AM

Originally posted by JS987 View Post

There is also RAID 10 which work with any number of drives.
http://en.wikipedia.org/wiki/Non-sta...nux_MD_RAID_10

Sure.
Let see you build a 200TB RAID 10 setup, and explain to the people who are paying for it, that you are spending twice as much on hardware (actually, more than twice, depending on the actual hardware configuration) just "because you don't like RAID6".

**JS987** · 25 February 2015, 08:12 AM

Originally posted by gilboa View Post

Sure.
Let see you build a 200TB RAID 10 setup, and explain to the people who are paying for it, that you are spending twice as much on hardware (actually, more than twice, depending on the actual hardware configuration) just "because you don't like RAID6".

RAID10 is much faster than RAID6 in some cases like random write which means multithreaded mixed random read/write also will be faster.

Benchmark results of Random I/O performance of different RAID levels

http://louwrentius.com/benchmark-results-of-random-io-performance-of-different-raid-levels.html

**gilboa** · 25 February 2015, 10:31 AM

Originally posted by JS987 View Post

RAID10 is much faster than RAID6 in some cases like random write which means multithreaded mixed random read/write also will be faster.
http://louwrentius.com/benchmark-res...id-levels.html

Again, your point may be valid for home desktop w/ 4 drives or a low-end server with 8 drives.
Building a 100 or 200TB RAID10 setup is simply not practical, let alone the fact that RAID10 is less *resilient*.

E.g. I'm currently building a 4U 4S server w/ 24 x 6TB (or 8TB) drives.
RAID6 (10% spares): 24 drives, 120TB (6TB, 2 parity, 2 spares). Can survive the lose of *any* two drives.
RAID7/Z (10% spares): 24 drives, 114TB (6TB, 2 parity, 2 spares). Can survive the lose of *any* 3 drives. (ZFS, future MD/btrfs w/ triple parity, etc)
RAID10.1 (10% spares, same disk count): 24 drives, 66TB (6TB, 2 spares). Can survive the lose of *any* one drive and it has a 50% chance of surviving the loss of a second drive.
RAID10.2 (10% spares, same storage space): 44 (!) drives, 120TB (6TB, 4 spares). Can survive the lose of *any* one drive and it has a 50% chance of surviving the loss of a second drive.

All in all, in-order to use RAID10, I should either write off 45% of the disk space or pay a 30-40% price premium and 100% rack-space premium (external 4S storage box).
Neither option can survive a two drive failure.

In short, RAID10 cannot be used in any serious high-volume deployment.

- Gilboa

**SystemCrasher** · 01 March 2015, 11:46 AM

Originally posted by sligor View Post

That's why btrfs only improve slowly and has difficulties to compete with ZFS, btrfs has far too much ambition on its features.

As far as I understand, ZFS uses more or less old boring RAID approach with more or less "fixed" striping scheme in use. Bad flexibility, long rebuild times, difficulty to reconfigure and ton of limitations are hallmarks of this approach. Same goes for "md raid" thing, btw.

Btrfs design is light years ahead. It does not minds arbitrary mix of raid levels. It can convert, say, set of chunks in RAID1 into RAID6 if you've used RAID1 and then decided go for RAID6 instead. Technically, it meant to be able to make allocation decision in terms of striping/parity for separate subvolume or even for separate files - internal design inherently assumes there could be various chunks using various redundancy schemes. There're devices. There're free space on these devices. There is scheme user has requested for particular allocation. As long as there're enough devices and free spaces, allocation succeeds. This means if you request, say, RAID 1, it do not have to be 2 disks. It can be, say, 3 disks. Then if, say, these were 3 x 1 TiB drives, you will be able to get 3/2 = 1.5TiB of space. All chunks will have a copy on at least 2 devices, but these could be different devices and different places on these devices, unlike in classic RAIDs mapping blocks one-to-one.

There is funny catch in this filesystem-aware-RAID design though. It is all about free space: btrfs knows how much devices are here and how much free space they have. Yet since it could be arbitrary mix of RAID levels, BTRFS can not predict what allocation schemes will be requested in future. That's where it getting interesting. Ask it to put new blocks as RAID 1 and it will be one amount of data you can fit. Ask it for RAID 0 and it will be different. Ask for RAID6 and it will be different amount which fits either. Since btrfs cant foresee what would happen in future, it makes "fair" reporting of free space tricky business. There are no longer single number to represent free space. But rather "conditional" stuff. Everything haves drawbacks and this is one of few drawbacks of futuristic and flexible btrfs design.

**gilboa** · 02 March 2015, 10:44 AM

Originally posted by SystemCrasher View Post

As far as I understand, ZFS uses more or less old boring RAID approach with more or less "fixed" striping scheme in use. Bad flexibility, long rebuild times, difficulty to reconfigure and ton of limitations are hallmarks of this approach. Same goes for "md raid" thing, btw.

Btrfs design is light years ahead. It does not minds arbitrary mix of raid levels. It can convert, say, set of chunks in RAID1 into RAID6 if you've used RAID1 and then decided go for RAID6 instead. Technically, it meant to be able to make allocation decision in terms of striping/parity for separate subvolume or even for separate files - internal design inherently assumes there could be various chunks using various redundancy schemes. There're devices. There're free space on these devices. There is scheme user has requested for particular allocation. As long as there're enough devices and free spaces, allocation succeeds. This means if you request, say, RAID 1, it do not have to be 2 disks. It can be, say, 3 disks. Then if, say, these were 3 x 1 TiB drives, you will be able to get 3/2 = 1.5TiB of space. All chunks will have a copy on at least 2 devices, but these could be different devices and different places on these devices, unlike in classic RAIDs mapping blocks one-to-one.

There is funny catch in this filesystem-aware-RAID design though. It is all about free space: btrfs knows how much devices are here and how much free space they have. Yet since it could be arbitrary mix of RAID levels, BTRFS can not predict what allocation schemes will be requested in future. That's where it getting interesting. Ask it to put new blocks as RAID 1 and it will be one amount of data you can fit. Ask it for RAID 0 and it will be different. Ask for RAID6 and it will be different amount which fits either. Since btrfs cant foresee what would happen in future, it makes "fair" reporting of free space tricky business. There are no longer single number to represent free space. But rather "conditional" stuff. Everything haves drawbacks and this is one of few drawbacks of futuristic and flexible btrfs design.

Two (plus) comments:
1. I doubt that any type of production deployment will seriously consider converting live data from RAID1 into RAID-N.
1a. ... As such, I doubt that btrfs mixed RAID level will see much real-world use, and as a result, see much testing.
2. Spanning RAID1 on multiple drives (RAID 1E) is supported by most new RAID controllers (and AFAIR, by MD-RAID1), this of-course, doesn't change the fact that RAID1E can only survive one drive failure.

Announcement

RAID 5/6 Continues Being Improved For Btrfs With Linux 3.20

Comment

Comment

Comment

Comment

Comment

Comment