Originally posted by waxhead
View Post
Bcachefs File-System Plans To Try Again To Land In Linux 6.6
Collapse
X
-
-
-
Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
I think the biggest problem with the RAID10 example I gave is that nobody who is familiar with RAID10 from other systems would expect a write pattern like this to be possible.
Code:| SDA | SDB | SDC | SDD | |-----|-----|-----|-----| | A1 | A2 | A1 | A2 | | B1 | B1 | B2 | B2 | | C1 | D1 | D1 | C1 | | D2 | C2 | C2 | D2 |
Thank you!
Comment
-
-
Originally posted by EphemeralEft View Post
It's actually BTRFS | LVM | DM-Crypt | BCache | DM-RAID, where DM-Crypt is managed by Cryptsetup and DM-RAID is managed by LVM. The top-most LVM Layer is split into different filesystems for different purposes.
Although I'm not using it, LVM actually has the option to layer DM-Integrity over each RAID member for per-member corruption detection. Because DM-Integrity treats corruption as read errors, the other RAID members are automatically used if the data on one member is corrupt. The RAID layout is a 6x4TB “raid6_ls_6”, which is a non-standard combination of left-symmetric RAID5 (distributed parity) but the last disk is dedicated to Q syndrome parity. This has the benefit that I can switch between RAID5 and "RAID6" without reshaping, at the expense of losing 1/6 disks worth of read performance. In theory RAID6 should also be able to tell which member is invalid in the case of a mismatch (without per-member DM-Integrity), but DM-RAID/LVM doesn't currently have that feature.
BCache is used in write-through mode, so the SSD can fail without data loss. My boot partition is a RAID1 at the beginning of all RAID members (thanks to Grub) so truly any 2 drives could fail without losing any data. I use the integrity checking of BTRFS as a sanity check of the RAID, BCache, and the SSD. It also functions as a janky method of "authenticated encryption". Besides the BTRFS RAID56 issues, at-rest encryption is important to me. So until BTRFS supports encryption, I'd need to encrypt all RAID members individually.
I honestly prefer having separate layers that I can manage myself. I can (and eventually will) switch BCache to DM-Cache. And move integrity checking from BTRFS to DM-Crypt for AEAD. A while ago I switched from MDAdm to DM-RAID. I couldn't mix and match implementations with an all-in-one solution. I also probably couldn't tweak as many settings.
Comment
-
-
Originally posted by Mark Rose View Post
Sure, that's for ZFS and bcachefs, but I've not heard of btrfs having tiered storage. I've actually been thinking it would be a fun project to get my feet wet with kernel development (no commitment yet).
Comment
-
-
Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
This is like the poster child for why so many of us whine about wanting ZFS to get merged. You can do very powerful things will all these layers, but it is extremely complicated, especially when something goes wrong. You obviously know what you are doing, but I've seen plenty of posts online from people attempting similar setups where they don't even order the layers correctly and end up negating some benefit they think they are getting. Having the volume management / encryption / file system / verification / etc. all baked into one thing is so much easier to grok and work with for most people.
Comment
-
-
Just an anecdote:
I used btrfs for 2 years in raid10 configuration and experienced data loss, but particularly bad data loss where a lot of files were experiencing a small amount of corruption.
Turns out it was some combination of raid10, autodefrag, and compression.
I used bcachefs for 3 years with the same drives, and the only issues were the filesystem updates when I upgraded versions.
Now I'm using btrfs raid10 with proxmox but I won't use any of the other features without fearing corruption again.
Comment
-
-
Originally posted by woddy View Post
I really don't understand... you criticize the alleged instability of Btrfs, and then you are looking forward to using an experimental fs, which as of today hasn't even been accepted into the kernel tree.
Strange, isn't it?
On the other hand with both ZFS and bcachefs, both development teams/developers have a very strict attitude when it comes to merging in changes, that is they will only add in changes they think are properly designed and stable. The difference between ZFS and bcachefs is that as is well known ZFS is never going to officially be accepted into Linux kernel tree which provides its own set of problems.
Thats why people are looking forward to bcachefs, its developed by someone who has the same quality control/attention to detail/stability+design mindset as ZFS but is a Linux first filesystem. Being a filesystem it will take years at least for that experimental flag to be lifted, but unlike BTRFS that flag will likely only get removed when it actually is stable.
Originally posted by stormcrow View Post
Personal note: I think the problem with BTRfs is that the only features that get enough attention to be stable and performant are the use cases the maintainers (mainly Facebook & Oracle?) utilize. For everyone else, we have to use ZFS which has a lot of big companies working on it so there's a more diverse user and developer base.Last edited by mdedetrich; 12 July 2023, 11:48 PM.
Comment
-
-
Originally posted by EphemeralEft View Post
All Linux filesystems (including BTRFS) have tiered storage if you put them on top of BCache (what BCacheFS is based on) or DM-Cache. Same with encryption if you put it on top of DM-Crypt. It's honestly a better solution than duplicating that work for every filesystem. And it works for non-filesystem block devices, too.
In fact that was the killer feature of ZFS in the first place, it combined both filesystem and volume management. That is was supposedly very anti-Linux/Unix in design, but it also provided massive advantages because since it made ZFS aware of both it allowed ZFS to do things both performance and data integrity wise which is not possible with mdadm + filesystem.
Even I personally experienced how annoying the separating out of abstractions in Linux when dealing with BTRF's is annoying compared to ZFS from a usability standpoint. I tried to experiment BTRF's with one project I was doing and when I created the filesystem initially I didn't set up an SSD read cache using bcache. Unfortunately I didn't realize at the time that if you wanted this functionality you should have initially created BTRF's a specific way which means that officially speaking you would need to reformat your BTRF's raid setup if you want bcache (iirc when I was looking into this there is a way around this but its some random script on the internet).
With ZFS this is a non issue, you can create a filesystem and at any point in time in the future you are free to both add and remove L2ARC (i.e. ssd read cache) without having to reformat. Same with compression and other settings.
Comment
-
-
Originally posted by EphemeralEft View Post
I get where you're coming from, and I agree that most people can't (or at least shouldn't) make setups like mine. But that's not a kernel issue; the technology is already there and it just needs a simple userspace tool to manage everything. There are only a few features that you would gain with an all-in-one solution, while duplicating existing functionality is almost always a bad idea.
Comment
-
-
Originally posted by EphemeralEft View Post
All Linux filesystems (including BTRFS) have tiered storage if you put them on top of BCache (what BCacheFS is based on) or DM-Cache. Same with encryption if you put it on top of DM-Crypt. It's honestly a better solution than duplicating that work for every filesystem. And it works for non-filesystem block devices, too.
Comment
-
Comment