Announcement

**pal666** · 31 January 2017, 10:33 AM

Originally posted by stiiixy View Post

Going by my testing the benefits of BTRFS simply weren't there compared to md on spinning rust (no SSDs).

did your testing include adding/removing drives and changing raid level of mounted rootfs ?
try this one with md

**stiiixy** · 31 January 2017, 11:02 AM

Originally posted by starshipeleven View Post

Wrong, there is an out-of-tree kernel module, and Ubuntu is using this. https://github.com/zfsonlinux/zfs

Because btrfs RAID1 is not like that?

Again, btrfs RAID1 is not like that, you can have a RAID1 with x1 1TiB and x2 512GiB drives and it will still have 1 TiB of capacity, as long as it can split the stuff on two different drives it's happy.
Also btrfs RAID1 has no way to increase the amount of redundancy, all drives you add increase capacity, not redundancy.

As said to others, btrfs's raid was not meant to compete with plain block-level RAID, but to allow btrfs to work safely on multiple drives.

The fact that you post bullshit like "the benefits of btrfs simply weren't there compared to md" you clearly show you don't fucking need any of the many features btrfs offers vs any other filesystem, so the issue is only on your side that you chose the wrong setup.

Given the stated needs, btrfs will never be the fs for you. If you don't give a fuck about checksums, snapshots, CoW, deduplication, array flexibility as said above (you can also convert array types live) and only need raw speed, you will never need to switch from ext4/xfs on mdadm raid. It's unlikely that raw speed of btrfs will ever match that of plain block-level raid with far simpler filesystems.

Someone's having their period. Maybe if you knew our use case, the future requirements of the site etc, your opinion might matter. But seeing as you can't even get the interpretation of RAID1 right, well, time for egg sandwiches.

BTRFS 5/6 has a proven data loss bug. You want to risk someone elses 40 years of data on a bug like that? Let me spell this out for you with regards to performance; SHIT. Hardware replacement wasn't an option, despite being up to the task. Speed wasn't even an issue until BRFS RAID 5/6 but it certainyl was thereafter. The existing system is rock-solid, working and BTRFS (R10) will be revisited again in 12 months when the time and money permits and new hardware is deployed.

I sent 6 months wanting BTRFS to be the answer.

**SystemCrasher** · 31 January 2017, 06:12 PM

Originally posted by stiiixy View Post

Hardware replacement wasn't an option, despite being up to the task. Speed wasn't even an issue until BRFS RAID 5/6 but it certainyl was thereafter. The existing system is rock-solid, working and BTRFS (R10) will be revisited again in 12 months when the time and money permits and new hardware is deployed.

You write it almost like if there're ppl from marketing department trying really hard to sell you something. Yet, btrfs devs do not sell storage solutions unlike Sun. They are merely hired by companies using btrfs for their deployments and so on. Btrfs probably works for them and their scenarios if they dare to deploy it, not to mention devs would fix things it it wouldn't be the case. Btw raid 5/6 in btrfs considered to be experimental and got some shortcomings, so using it in production is probably not the best idea ever.

I sent 6 months wanting BTRFS to be the answer.

Waiting for [some time] on its own would't do any magic. Except getting you somwhat older ofc.

**jacob** · 31 January 2017, 06:16 PM

Originally posted by starshipeleven View Post

Nope. Flash chips are in the "RAM" category, they are "random access memory", fragmentation does not affect performance (as long as there is enough free space for new writes, SSD controllers also do scans on their own to "defrag" and compact their flash level to leave enough free space without telling anything to the OS, but fragmentation per-se isn't an issue).
Hard drives are of course sequential access memory, that's a very high-tech version of a gramophone after all.

The main limitation of flash is read/write speed, a flash chip isn't terribly fast on its own.
SSD controllers give you far more speed than the average usb flashdrive because they actively fragment the writes you do on different flash chips, making a "RAID0" of sorts (some also have caches and other stuff on different faster chips and whatever).

I think you missed my point. It's not true that fragmentation does not affect performance on flash chips. A buffer of contiguous blocks can be transferred very efficiently using a single DMA operation. If on the other hand the data are fragmented across several disjoint extents, the OS block layer will need to issue as many DMA requests and reassemble the results, which entails a significant performance penalty. That penalty is *somewhat* lower than on a rotating disk because there is no delay to move the head etc., but it's still very much there.

So my question remains, whether the fact that the device remaps logical block addresses to reduce wear (which is a good thing) prevents it from being able to transfer logically continuous buffers in a singe operation or not?

**stiiixy** · 31 January 2017, 06:58 PM

Originally posted by SystemCrasher View Post

You write it almost like if there're ppl from marketing department trying really hard to sell you something. Yet, btrfs devs do not sell storage solutions unlike Sun. They are merely hired by companies using btrfs for their deployments and so on. Btrfs probably works for them and their scenarios if they dare to deploy it, not to mention devs would fix things it it wouldn't be the case. Btw raid 5/6 in btrfs considered to be experimental and got some shortcomings, so using it in production is probably not the best idea ever.

Waiting for [some time] on its own would't do any magic. Except getting you somwhat older ofc.

Yes, that sales engine is called 'the Internet'. Thats also supposed to be a joke
I waited years before the driver had matured before I tested on some bigger iron than some home job NAS. The simple management of BTRFS arrays is what initially sold me as we could do away with all the legacy custom hardware stuff we've been relying. The six months 'waiting' was us simply us pounding on the BTRFS server. Needless to say, it will likely be deployed when time permits. It just fell short of deployment FOS US at this instance because we would prefer RAID6 but the realities fell just short. No buggy. Not sure why others ate getting their panties in a know. I use BTRFS at home.

**starshipeleven** · 31 January 2017, 07:00 PM

Originally posted by jacob View Post

So my question remains, whether the fact that the device remaps logical block addresses to reduce wear (which is a good thing) prevents it from being able to transfer logically continuous buffers in a singe operation or not?

No. The speed to fetch the data from blocks anywhere in the SSD is the same, because the flash chips are random access memory, not sequential like hard drives.
There isn't "somehwat lower" penalty, there is no penalty at all because the flash technology reads at the same speed and with the same latency in any cell.

**starshipeleven** · 31 January 2017, 07:12 PM

Originally posted by stiiixy View Post

Someone's having their period.

Someone managed to post everything wrong.

Maybe if you knew our use case

You stated it. You want performance and you don't care of other features. Because otherwise you would be talking of ZFS instead of mdadm raid.

But seeing as you can't even get the interpretation of RAID1 right

Btrfs's "RAID1" is not actual RAID1, if you don't know basic info about btrfs it is not my problem.

BTRFS 5/6 has a proven data loss bug. You want to risk someone elses 40 years of data on a bug like that?

And ZFS what issues has that you could not use it?

Let me spell this out for you with regards to performance; SHIT.

See? You only want performance. Please don't use btrfs, as it's never going to be faster than block-level RAID that you use already.

The existing system is rock-solid, working and...

.... since it is using mdadm RAID I still don't see why you really want btrfs since you seem to be fine with that.

**jacob** · 31 January 2017, 07:17 PM

Originally posted by starshipeleven View Post

No. The speed to fetch the data from blocks anywhere in the SSD is the same, because the flash chips are random access memory, not sequential like hard drives.
There isn't "somehwat lower" penalty, there is no penalty at all because the flash technology reads at the same speed and with the same latency in any cell.

The speed to fetch each individual block is the same on the SSD but that's not what I'm talking about. Fetching 10 contiguous blocks over say SATA takes 1 DMA burst transfer. Fetching 10 blocks split into 2 extents takes 2 consecutive DMA transfers (SSD or not). So yes, there is large performance penalty for fragmentation on SSDs, not as high as on rotating disks but the difference is much smaller that what you seem to believe.

**starshipeleven** · 31 January 2017, 07:36 PM

Originally posted by jacob View Post

The speed to fetch each individual block is the same on the SSD but that's not what I'm talking about. Fetching 10 contiguous blocks over say SATA takes 1 DMA burst transfer. Fetching 10 blocks split into 2 extents takes 2 consecutive DMA transfers (SSD or not). So yes, there is large performance penalty for fragmentation on SSDs, not as high as on rotating disks but the difference is much smaller that what you seem to believe.

I'm not talking of blocks, I'm talking of physical layer. SSDs have fragmentation on physical layer, then they can also get fragmented on block layer if the filesystem allows that (ext4 usually does not), but that is a filesystem issue that would be the same on any other block device.

Block layer has no fucking idea of how is done and how works the physical layer, what appears as a full contiguous file on block layer may be all over the disk on the physical layer, but that's completely irrelevant or even beneficial as it gets more read speed due to its "RAID0"-like implementation on physical layer.

**jacob** · 31 January 2017, 07:51 PM

Originally posted by starshipeleven View Post

Block layer has no fucking idea of how is done and how works the physical layer, what appears as a full contiguous file on block layer may be all over the disk on the physical layer, but that's completely irrelevant or even beneficial as it gets more read speed due to its "RAID0"-like implementation on physical layer.

That's exactly my point. The block layer knows nothing about this so it would issue DMA requests as if this mechanism didn't exist. Hence back to the question: does the SSD accept single, burst-mode operations for series of blocks that are logically continuous but physically scattered around, and does it perform them with the same performance as if the blocks were physically contiguous? Or is the max DMA transfer it would accept limited to the size of a physically continuous extent? In the first case, it's all great; in the second case, it has a performance penalty.

Announcement

4-Disk Btrfs Native RAID Performance On Linux 4.10

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment