Announcement

**sdack** · 28 April 2018, 07:26 AM

Originally posted by RahulSundaram View Post

I see where your confusion comes from now. Commercial organizations are not your "friends".

No, you don't, because I'm not confused. You wish I was, maybe because you want to be my friend and help me, but it's really just you who doesn't have friends. And it also explains why you don't see any organization, commercial or not, as your friend. That's sad.

I know you want to reply now, but I am not going to care about your friendless world view. Really, I don't.

Or ask yourself, what a sad world it would be if Michael here couldn't also be friends with the people who report news to him, who read his news and anyone else involved, just because he is running a business? And yes, I am fully aware that some people live in a world without having any friends. It sure doesn't mean people, who have friends, are confused.

**pal666** · 28 April 2018, 10:32 AM

Originally posted by waxhead View Post

I like BTRFS, but I have always wondered if not a virtual block layer that maps "virtual offsets" to real physical offsets (almost like a MMU) would not have been better in some ways. That would allow for just about any filesystem on top while still allowing for some of the BTRFS/ZFS features.

as subj highlighted, we have such thing in device mapper, but it is slow

**pal666** · 28 April 2018, 10:36 AM

Originally posted by RahulSundaram View Post

Commercial organizations are not your "friends".

you are building community based on hate?

**RahulSundaram** · 28 April 2018, 01:14 PM

Originally posted by pal666 View Post

you are building community based on hate?

Red Hat has decided it doesn't want to support Btrs for it's commercial product. It isn't an emotional decision. Friendship or hate doesn't picture into that decision whatsoever

**sdack** · 29 April 2018, 09:25 AM

Originally posted by RahulSundaram View Post

Friendship or hate doesn't picture into that decision whatsoever

Exactly, for RedHat it obviously doesn't, but in reality it certainly does, because companies often hire people not just because they have a nice CV, but because they already know them and are friends with them. It's then not really surprising to read about Btrfs developers leaving prior to this decision. There is obviously more going on than meets the eye.

Just don't think friendships don't matter. Sure, money rules the world, but when the whole world runs on commerce and everybody knows it then this fact becomes irrelevant and friendships certainly matter.

Or would you say people cannot be friends because everybody needs to breath air and air is more important than friendships? For your sake I hope not...

**RahulSundaram** · 29 April 2018, 10:03 AM

Originally posted by sdack View Post

Exactly, for RedHat it obviously doesn't, but in reality it certainly does, because companies often hire people not just because they have a nice CV, but because they already know them and are friends with them.

The topic is not hiring. In reality, every commercial organization decides what features it is going to support as a business decision. It is a pretty simple obvious fact.

**sdack** · 30 April 2018, 06:34 AM

Originally posted by RahulSundaram View Post

It is a pretty simple obvious fact.

Which makes you Captain Obvious.

**DrYak** · 30 April 2018, 08:32 AM

Originally posted by kreijack View Post

Having the two layer in the same subsystem, allow a lot of optimization
- in case of raid1/5/6, it is possible to recovery the data on the basis of the checksum

Note that :
- Raid1 is considered stable for quite some time and used in production.
- Raid 5/6 on btrfs is only beginning to fully implement this feature now (see the "let raid56 try harder to rebuild damaged data, reading from all stripes if
necessary / fix scrub to repair raid56 in a similar way as in the case above" patch that 4.16 gained).

Originally posted by kreijack View Post

- in case of raid1/5/6 the resync of the raid is performed checking only the allocated data (not the full disks)

Which is actually doable in LVM to for the exact same reason :
- LVM can implement RAID1 directly (using the kernel DM facility) without a separate MDADM layer.
- LVM allows "thin provisioning" only using extents to write data and not using extents to keep "empty" space.
AND
- File systems implement "fstrim" to signal which zone are empty and not holding data
- LVM can understand these fstrim commands and de-allocate unused extents from thin pools

In short :
- RAID (DM's RAID1), Volume managament (LVM) and filesystems are able to communicate which each other (passing "discard" instructions) and are thus able to perform rsync on use blocks only.

The same kind of integration that BTRFS has been doing from the beginning (down to *also* being sharing bits with DM), with the exact same effect.

The advantage of a an integrated full stack like BTRFS (and ZFS) compared to separate discrete component (LVM+filesystem) is that they can do *a lot more* for which there is currently no API.

There are only APIs (fstrim/discard) to tell which extents are empty, and that's about what you can do currently.

There are currently no APIs regarding file system checkcums, a hypothetical filesystem that would be doing data checksuming (currently there are none beside BTRFS and ZFS) would have no way to communicate with either LVM or RAID and tell : "No sorry, the returned block is not matching checksum, it's probably corrupted. Please try harder/differently".
Whereas BTRFS offers a way to do that (and actually does it in production with RAID1, able to switch to the alternate copy if reading fails. And is working toward doing the same with RAID5/6, see patches mentioned above).
And ZFS can do it to being its own monolithic full stack (no code shared with DM, it uses it's own re-implementation. Hence the "layer violation" complaints that it's getting unlike BTRFS).

Originally posted by kreijack View Post

BTRFS allow the user to experiment lot of features (raid, snapshot, quota, multidevice management) which create an huge number of combination of use case; in some of these the performance are not good.
However very few filesystems (ZFS come to me) expose these functionalities, so there was not a precedent experience about how complex could be the this development.

And again, ZFS is cheating by using it's own giant monolithic stack instead of sharing code with DM.

This is also why I'm not holding my breath until XFS finishes implementing the CoW/snapshotting/etc. features it is promising.
It's tremendously long and complicated work. Be prepared those feature to be in beta state for a quite long time, similar to what BTRFS is undergoing.

This is also why I'm not as vocal as other regarding BCachefs.
Don't get me wrong, Kent *is* doing tremendous work. But CoW/snapshotting are far from ready yet (he's in fact only beginning to work on them recently according to Patreon, and RAID5/6 haven't even begun yet).
There's still a lot of work. It is going to be long and complicated until BCacheFS is stable *with* similar features as BTRFS/ZFS.

That's why I consider BTRFS to be the current best solution. And it's pretty much stable and production ready *AS LONG AS* you stick to the feature set that is currently considered stable (so don't use RAID5/6 yet, wait for them to be greenlit and marked production ready on the BTRFS status wikipage).

BCacheFS is probably going to be the next best thing afterward, *provided you're patient and still wait for a couple of years*. It's not there yet.

**RahulSundaram** · 30 April 2018, 09:05 AM

Originally posted by sdack View Post

Which makes you Captain Obvious.

Good that you aren't denying the obvious fact anymore. Served the purpose

**kreijack** · 30 April 2018, 10:35 AM

Originally posted by DrYak View Post

Which is actually doable in LVM to for the exact same reason :
- LVM can implement RAID1 directly (using the kernel DM facility) without a separate MDADM layer.
- LVM allows "thin provisioning" only using extents to write data and not using extents to keep "empty" space.
AND
- File systems implement "fstrim" to signal which zone are empty and not holding data
- LVM can understand these fstrim commands and de-allocate unused extents from thin pools

I am not sure about what you reported. My understanding of what you wrote is that, if I have an (e.g.) raid5 composed by 6 disk of 100GB, a rebuild requires to read all 100GB of each disks. What is allocated above doesn't matter.
In BTRFS, the checks is related only to the file stored on the disks (i.e. if the filesystem is half filled, you need to read only 6x100/2 GB ...)

The point is not only how "communicate" between the layers, but also how store and handle the metadata. In BTRFS, the check is made on the basis of the allocated extents. This is doable, because the scrub process is able to access the extents allocated.

In MD (which DM used for the raid implementation) there is no the concept to "allocated extents". The fact that the layers above can know what is allocated and where doesn't matters because the resync is performed by the lower MD layer.

Announcement

Learning More About Red Hat's Stratis Project To Offer Btrfs/ZFS-Like Functionality

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment