4-Disk Btrfs Native RAID Performance On Linux 4.10

starshipeleven replied

30 January 2017, 12:18 PM
Originally posted by nomadewolf View Post

Is it me, or these benchmarks only serve to ilustrate how poorly RAID has been implemented?

To decide if its implementation is poor or not you should look at the only other filesystem that offers similar features, ZFS.

Of course a dumb block-level RAID is faster, it does not have to deal with all the stuff btrfs must offer.
Likes 1
Leave a comment:
starshipeleven replied

30 January 2017, 12:15 PM
Originally posted by jacob View Post

That's a different thing entirely. With a traditional journaling filesystem (not log structured) each update operation (file write, creation, deletion etc) involves writing metadata into the journal, which is at a fixed location on the disk. That means that there are a couple blocks which are constantly written over and over and over, which reduces the lifetime of a SSD quite a bit.

Nope, ssds (and even SD cards nowadays) have wear-leveling so while for the block layer it is the same block, for the actual flash cell it's not.
Leave a comment:
starshipeleven replied

30 January 2017, 12:12 PM
Originally posted by Zucca View Post

I don't think you can do RAID-1E on btrfs.

Also for those who wonder btrfs RAID-1 speed: RAID-1 in btrfs is basically all disks smashed together as one JBOD and then split into two with mirroring. So theorically you'd get maximum of 2x disk speed from btrfs RAID-1 no matter how many disks you have in the pool. At least that's how I've understood it. Anyone is welcome to correct me if I'm wrong.

Yeah, that's the theoretical.

The reason there is a performance hit here is that afaik btrfs does not have the logic to figure out how is stuff split and where and balancing reads from two drives, it just fetches the first data it finds regardless of drive in the array, period.

Reading at the same time from all drives in a block-level RAID1is a piece of cake, as each drive is a block-level copy of the others, on btrfs.... it's not.
Leave a comment:
Zucca replied

30 January 2017, 11:22 AM
Originally posted by AndyChow View Post

What?!? ZFS is a LOT slower than btrfs, on every metric.

What I've heard is opposite, but that a while ago. At least ZFS uses much more RAM, thus giving the impression of heavy caching... that could yield to faster performance on certain situations.

As there seems to be contradicting information among us, I'd really would like to see ZFS vs. btrfs comparison with memory usage. Also with and without a cache disk (some fast NVMe storage).
Leave a comment:
AndyChow replied

30 January 2017, 11:03 AM
Originally posted by Spacefish View Post

ZFS could be a nice comparison too, as it offers most of the featured btrfs offers without being that slow!
and let's face it btrfs is slow compared to almost any other fs

What?!? ZFS is a LOT slower than btrfs, on every metric.
Likes 1
Leave a comment:
nomadewolf replied

30 January 2017, 10:59 AM
Is it me, or these benchmarks only serve to ilustrate how poorly RAID has been implemented?
Leave a comment:
lbalbalba replied

30 January 2017, 07:59 AM
Originally posted by Geopirate View Post

Wait, did they already fix the raid 5/6 issues?

My thoughts exactly. Last I heard, there still were some serious (data corruption ?) issues and the developers strongly advised to not use it.
Leave a comment:
Zucca replied

30 January 2017, 05:28 AM
Originally posted by waxhead View Post

There seems to be some confusion to what BTRFS' *native* RAID actually is.... and just for the record BTRFS RAID is no the same as MD Raid or any other hardware raid for that matter. So allow me to explain (a bit simple) what modes BTRFS support support and how they work. Let's pretend you have 4 disks in a pool , all part of a single filesystem.

SINGLE:
Will keep ONE copy of the data on ANY disk regardless of how many disks that exist in the pool.

DUP:
Will keep TWO copies of the data on THE SAME disk, regardless of how many disks that exists in the pool

RAID0
Will spread the data over ALL disks in the pool. So for simplicity sake: Think of this as having a 4 MB file, 1MB will be stored per disk.

RAID1:
Will keep TWO copies of the data on DIFFERENT disks (you can loose one disk), regardless of how many disks that exists in the pool.

RAID10:
Will spread the data over HALF the number of disks in the pool, and duplicate it e.g. keep a copy on the other half of the disks that exist in the pool.

RAID5:
Will spread the data over ALL the disks in the pool, however 1/4th of the total space (4 disks remember) is used for special data called parity that can be used to reconstruct any of the data on the other disks. If the parity is lost it can simply recalculate the parity from the working disks. And since BTRFS does checksum it can know if the data is reliable before reconstructing parity / lost data. You can loose one disk and still reconstruct data.

RAID6:
Same as RAID5. but will use 2/4th of the total space (again 4 disks remember) for this special data called parity. This means you can loose two disks and still reconstruct data.

Something like this should be on btrfs wiki. Simple and well explained.
Likes 1
Leave a comment:
Serafean replied

30 January 2017, 05:02 AM
Originally posted by waxhead View Post

RAID5:
Will spread the data over ALL the disks in the pool, however 1/4th of the total space (4 disks remember) is used for special data called parity that can be used to reconstruct any of the data on the other disks. If the parity is lost it can simply recalculate the parity from the working disks. And since BTRFS does checksum it can know if the data is reliable before reconstructing parity / lost data. You can loose one disk and still reconstruct data.

RAID6:
Same as RAID5. but will use 2/4th of the total space (again 4 disks remember) for this special data called parity. This means you can loose two disks and still reconstruct data.

Word of advice: Raid5/6 in BTRFS is a brilliant way to test your backups. In other words it is NOT production ready and marked as UNSTABLE right now.
BTRFS allows different profiles on data and metadata (data about data).

In BTRFS terms the usage of the RAID name is a bit wrong since the redundancy is basically a mix of copies, stripes and parities.

For completeness' sake :
raid 5 will spread data on all the disks in the pool, and you get to use (TOTAL_CAPACITY - SIZE_OF_LARGEST_DISK)
With raid6 it's (TOTAL_CAPACITY - 2*SIZE_OF_LARGEST_DISK)
In reality I found that the best is to have two (three) of the largest disks.

Agreed on the advice part.
Leave a comment:
ldo17 replied

30 January 2017, 04:54 AM
Originally posted by Zan Lynx View Post

SSDs use a Flash Translation Layer to spread writes out over erase blocks.

That’s what I meant by “underlying SSD implementation”.
Leave a comment:

Announcement

4-Disk Btrfs Native RAID Performance On Linux 4.10

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: