4-Disk Btrfs Native RAID Performance On Linux 4.10

jacob replied

31 January 2017, 06:16 PM
Originally posted by starshipeleven View Post

Nope. Flash chips are in the "RAM" category, they are "random access memory", fragmentation does not affect performance (as long as there is enough free space for new writes, SSD controllers also do scans on their own to "defrag" and compact their flash level to leave enough free space without telling anything to the OS, but fragmentation per-se isn't an issue).
Hard drives are of course sequential access memory, that's a very high-tech version of a gramophone after all.

The main limitation of flash is read/write speed, a flash chip isn't terribly fast on its own.
SSD controllers give you far more speed than the average usb flashdrive because they actively fragment the writes you do on different flash chips, making a "RAID0" of sorts (some also have caches and other stuff on different faster chips and whatever).

I think you missed my point. It's not true that fragmentation does not affect performance on flash chips. A buffer of contiguous blocks can be transferred very efficiently using a single DMA operation. If on the other hand the data are fragmented across several disjoint extents, the OS block layer will need to issue as many DMA requests and reassemble the results, which entails a significant performance penalty. That penalty is *somewhat* lower than on a rotating disk because there is no delay to move the head etc., but it's still very much there.

So my question remains, whether the fact that the device remaps logical block addresses to reduce wear (which is a good thing) prevents it from being able to transfer logically continuous buffers in a singe operation or not?
Leave a comment:
SystemCrasher replied

31 January 2017, 06:12 PM
Originally posted by stiiixy View Post

Hardware replacement wasn't an option, despite being up to the task. Speed wasn't even an issue until BRFS RAID 5/6 but it certainyl was thereafter. The existing system is rock-solid, working and BTRFS (R10) will be revisited again in 12 months when the time and money permits and new hardware is deployed.

You write it almost like if there're ppl from marketing department trying really hard to sell you something. Yet, btrfs devs do not sell storage solutions unlike Sun. They are merely hired by companies using btrfs for their deployments and so on. Btrfs probably works for them and their scenarios if they dare to deploy it, not to mention devs would fix things it it wouldn't be the case. Btw raid 5/6 in btrfs considered to be experimental and got some shortcomings, so using it in production is probably not the best idea ever.

I sent 6 months wanting BTRFS to be the answer.

Waiting for [some time] on its own would't do any magic. Except getting you somwhat older ofc.
Leave a comment:
stiiixy replied

31 January 2017, 11:02 AM
Originally posted by starshipeleven View Post

Wrong, there is an out-of-tree kernel module, and Ubuntu is using this. https://github.com/zfsonlinux/zfs

Because btrfs RAID1 is not like that?

Again, btrfs RAID1 is not like that, you can have a RAID1 with x1 1TiB and x2 512GiB drives and it will still have 1 TiB of capacity, as long as it can split the stuff on two different drives it's happy.
Also btrfs RAID1 has no way to increase the amount of redundancy, all drives you add increase capacity, not redundancy.

As said to others, btrfs's raid was not meant to compete with plain block-level RAID, but to allow btrfs to work safely on multiple drives.

The fact that you post bullshit like "the benefits of btrfs simply weren't there compared to md" you clearly show you don't fucking need any of the many features btrfs offers vs any other filesystem, so the issue is only on your side that you chose the wrong setup.

Given the stated needs, btrfs will never be the fs for you. If you don't give a fuck about checksums, snapshots, CoW, deduplication, array flexibility as said above (you can also convert array types live) and only need raw speed, you will never need to switch from ext4/xfs on mdadm raid. It's unlikely that raw speed of btrfs will ever match that of plain block-level raid with far simpler filesystems.

Someone's having their period. Maybe if you knew our use case, the future requirements of the site etc, your opinion might matter. But seeing as you can't even get the interpretation of RAID1 right, well, time for egg sandwiches.

BTRFS 5/6 has a proven data loss bug. You want to risk someone elses 40 years of data on a bug like that? Let me spell this out for you with regards to performance; SHIT. Hardware replacement wasn't an option, despite being up to the task. Speed wasn't even an issue until BRFS RAID 5/6 but it certainyl was thereafter. The existing system is rock-solid, working and BTRFS (R10) will be revisited again in 12 months when the time and money permits and new hardware is deployed.

I sent 6 months wanting BTRFS to be the answer.
Leave a comment:
pal666 replied

31 January 2017, 10:33 AM
Originally posted by stiiixy View Post

Going by my testing the benefits of BTRFS simply weren't there compared to md on spinning rust (no SSDs).

did your testing include adding/removing drives and changing raid level of mounted rootfs ?
try this one with md
Leave a comment:
SystemCrasher replied

31 January 2017, 10:11 AM
Originally posted by Zucca View Post

Hm. Obsolote? Which "brach" of ZFS? OpenZFS, OracleZFS? (Are there more?)

I guess he refers to the fact ZFS had its own memory management, not really integrated with rest of linux kernel memory mangement. OTOH btrfs is properly integrated with Linux kernel from the very start. So btrfs behaves pretty much like any other Linux filesystem and does not hogs much memory, nor one have to fiddle with cache size tuning on low mem systems. It could even be used on small pi sized SBC things, Linux Sunxi ppl are quite serious on using btrfs to create tiny NAS-like things, etc. Not sure if someone managed to do something about this weird memory management in ZFS. I don't care anyway since I'm not going to use foreign modules, especially for filesystems. Especially for boot filesystem, etc. Easy to guess why.

l also wonder how much (if any) there is performance gain/loss between those.

Then I guess you could run benchmarks? Though who cares of e.g. proprietary Oracle solaris? Granted recent news, who would use Oracle Solaris anyway? They have to be really crazy persons.

RAID6 too). I've head that ZFS on the other hand is more picky (enteprise users don't really care about that)... But still flexible when compared to "regular raid".

Btrfs does RAID quite smartass way. If you've got 5 storages and want mirror, it comes down to "put me 2 copies of blocks on different devces". Interesting part is the fact pairs of devices aren't sent in stone and its runtime decision instead, the only limit is there should be at least some 2 devices with some free space on them. They do not have to be equal size, etc. So if one uses mirror and got 5 similar devices they could expect approx 2.5 x device size capacity. Getting a bit more complicated to compute if devices are inequal, hopefully it gives idea why question like "how many free space I have left" isn't very trivial to asnswer for btrfs. It would keep storing mirrored blocks as far as it could find 2 different devices with some free space, and these do not have to be same pair of devices all the time.

Last edited by SystemCrasher; 31 January 2017, 10:15 AM.
Leave a comment:
pal666 replied

31 January 2017, 10:04 AM
Originally posted by Zucca View Post

Hm. Obsolote? Which "brach" of ZFS? OpenZFS, OracleZFS? (Are there more?)

yes. all of them.
zfs was designed before invention of cow btrees. so zfs designers sacrificed btrees for cow.

Originally posted by Zucca View Post

I've head that ZFS on the other hand is more picky

i've heard it can't change filesystem size

Last edited by pal666; 31 January 2017, 10:08 AM.
Leave a comment:
pal666 replied

31 January 2017, 10:00 AM
Originally posted by jacob View Post

sudo apt install zfs

sudo: apt: command not found

btw, linux is https://git.kernel.org/cgit/linux/ke...inux.git/tree/
Leave a comment:
starshipeleven replied

31 January 2017, 06:48 AM
Originally posted by Zucca View Post

Hm. Obsolote? Which "brach" of ZFS? OpenZFS, OracleZFS? (Are there more?)

He is talking about core ZFS design concept, not the slightly different branches.
Leave a comment:
starshipeleven replied

31 January 2017, 06:24 AM
Originally posted by stiiixy View Post

As far as I am aware, ZFS on Linux was still a FUSE-vased implementation, and would therefore not yield proper results compared to a native Solaris or at least BSD system.

Wrong, there is an out-of-tree kernel module, and Ubuntu is using this. https://github.com/zfsonlinux/zfs

And why has no one here mentioned the one critical point about RAID1?

Because btrfs RAID1 is not like that?

You also only have that smallest drives capacity limit as well. But, you'd have redundancy across 100 drives.

Again, btrfs RAID1 is not like that, you can have a RAID1 with x1 1TiB and x2 512GiB drives and it will still have 1 TiB of capacity, as long as it can split the stuff on two different drives it's happy.
Also btrfs RAID1 has no way to increase the amount of redundancy, all drives you add increase capacity, not redundancy.

Going by my testing the benefits of BTRFS simply weren't there compared to md on spinning rust (no SSDs).

As said to others, btrfs's raid was not meant to compete with plain block-level RAID, but to allow btrfs to work safely on multiple drives.

The fact that you post bullshit like "the benefits of btrfs simply weren't there compared to md" you clearly show you don't fucking need any of the many features btrfs offers vs any other filesystem, so the issue is only on your side that you chose the wrong setup.

Just not yet for my needs.

Given the stated needs, btrfs will never be the fs for you. If you don't give a fuck about checksums, snapshots, CoW, deduplication, array flexibility as said above (you can also convert array types live) and only need raw speed, you will never need to switch from ext4/xfs on mdadm raid. It's unlikely that raw speed of btrfs will ever match that of plain block-level raid with far simpler filesystems.
Leave a comment:
starshipeleven replied

31 January 2017, 06:04 AM
Originally posted by jacob View Post

Let's say the FS wants to transfer a number of logically contiguous blocks (like an extent, for example). Normally it would occur as a single DMA operation, in burst mode. But if the physical blocks are scattered around, would that affect the transfer speed and/or max number of blocks transferred per request?

Nope. Flash chips are in the "RAM" category, they are "random access memory", fragmentation does not affect performance (as long as there is enough free space for new writes, SSD controllers also do scans on their own to "defrag" and compact their flash level to leave enough free space without telling anything to the OS, but fragmentation per-se isn't an issue).
Hard drives are of course sequential access memory, that's a very high-tech version of a gramophone after all.

The main limitation of flash is read/write speed, a flash chip isn't terribly fast on its own.
SSD controllers give you far more speed than the average usb flashdrive because they actively fragment the writes you do on different flash chips, making a "RAID0" of sorts (some also have caches and other stuff on different faster chips and whatever).
Leave a comment:

Announcement

4-Disk Btrfs Native RAID Performance On Linux 4.10

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: