Announcement

**timofonic** · 10 July 2023, 06:31 PM

Originally posted by vladpetric View Post

The RAID 5/6 write hole is something that a filesystem needs to a have a solution for ahead of time. I simply don't think BTRFS can introduce a fix to the write hole issue at this time.

A bit circuitous, but the main comments explain the problem:

https://www.reddit.com/r/btrfs/comments/p111o9/raid56_what_is_the_write_hole_failure_mode/

ZFS takes care of the write hole problem through its transactions, pretty much/AFAICT.

https://www.klennet.com/notes/2019-07-04-raid5-vs-raidz.aspx#:~:text=ZFS%20works%20around%20the%20wr ite,the%20write%20hole%20goes%20away.

Thanks a lot! Very interesting read!

It seems Bcachefs still doesn't support RAID. Am I right? I hope Bcachefs implements something similar to RAIDZ someday.

It seems Bcachefs is still missing scrub and the ability to easily replace devices.

With the last benchmarks from Phoronix, even Btrfs still outperforms Bcachefs in a lot of cases. This is saying a lot, because Btrfs is no performance king. I hope they manage to achieve EXT4 or XFS performance someday.

Bcachefs doesn't even have plans to have a "vdev" like offering, unless you count layering with MD. It seems using it safely with a lot of disks is going to be complicated too. I wonder if this will change.

Is it possible to shrink partitions in Bcachefs?

Maybe I'm wrong about all this..

**vladpetric** · 10 July 2023, 07:03 PM

Originally posted by timofonic View Post

Thanks a lot! Very interesting read!

It seems Bcachefs still doesn't support RAID. Am I right? I hope Bcachefs implements something similar to RAIDZ someday.

It seems Bcachefs is still missing scrub and the ability to easily replace devices.

With the last benchmarks from Phoronix, even Btrfs still outperforms Bcachefs in a lot of cases. This is saying a lot, because Btrfs is no performance king. I hope they manage to achieve EXT4 or XFS performance someday.

Bcachefs doesn't even have plans to have a "vdev" like offering, unless you count layering with MD. It seems using it safely with a lot of disks is going to be complicated too. I wonder if this will change.

Is it possible to shrink partitions in Bcachefs?

Maybe I'm wrong about all this..

The manual for bcachefs does say that using replicas is equivalent to RAID 1 or RAID 10. https://wiki.archlinux.org/title/Bcachefs

How well that works in practice is an open question of course

. And without adoption, the bugs are ironed out at a much much slower pace.

Yeah, btrfs is atrocious if you try to put a database on it. The COW part doesn't mesh well with the way databases set their files up. Now, ZFS by default will be slow with a database (real db, such as mysql or pg), but there are reasonable tuneables for the DB ZFS volume that makes the performance totally alright. So, you create a subvolume (or two) for the DB, set the tuneables appropriately, and you're good to go actually ...

Oh, can't you also turn off COW on btrfs for a subdirectory? Yes, you can (it improves the performance somewhat, though nothing spectacular), but that also disables checksums immediately ... Uhmmm ...

WRT to second question - I don't know.

**lowflyer** · 11 July 2023, 01:27 AM

Originally posted by skeevy420 View Post

My ignorant ass had to read that three times to get the joke. Dafuq is a B.C.A. chef?

I can't help it. Whenever I see that word "bcachefs" my inner eye sees Jean-Pièrre with his "onyon soup":

chef.jpg

https://youtu.be/JqpwEpMFoVc

**timofonic** · 11 July 2023, 06:01 AM

vladpetric

I asked Alexey V. Gubin from Klennet about Bcachefs in general and from a recovery point of view and he replied the following:

Hello,

Unfortunately, no way I will be doing proper evaluation now. It takes several weeks for every new filesystem, just to figure out how things mesh together and then to plan for the recovery algorithms. Then, the practical implementation, not even a particularly good one, is something to the tune of six months.

However, after a quick look, my first impression is "looks expensive". Basically, bcachefs combines all the difficult parts of F2FS and BTRFS. Device maps and encryption keys are in superblocks only. So, you format the disk set, superblocks are overwritten with new ones, all copies of device maps and all copies of the encryption key are poof-gone. If the filesystem was encrypted, it is now unrecoverable. If it was not, there is a typical problem of figuring out which device is what. Some place in the documentation mentions 8-bit generation numbers; I am pretty sure it will turn out to be a problem at some point. On the plus side, at least it does not have an equivalent of BTRFS chunk tree. In BTRFS, chunk tree reconstruction is rather challenging (I mean, nobody has a reliable method yet). Then again, I think bcachefs does not have a RAID5/6 either, yet, so it is too early to celebrate.

Some kind of endian shenanigans, meaning most likely no commercially viable BE version recovery, so big endian guys will be out of luck. Not many of those around anyway, so no problem.

All in all, expensive.

Regards,
Alexey

What do you and others think about his reply? Despite I don't understand everything, I appreciate his opinion. It would be amazing if others here can explain a bit about it.

I really hope Kent gets opinions and analysis from the recovery community too. Despite not veing filesystem designers per se, they can provide very valuable feedback avout data recovery (something I consider very important for a filesystem) and other considerations.

I also found the folliwng LWN article and comments to be a crap show too. Go abd watch it with the following link:

The rest of the 6.5 merge window [LWN.net]

https://lwn.net/SubscriberLink/937006/b5905509cad38258/

**Quackdoc** · 11 July 2023, 06:21 AM

Originally posted by timofonic View Post

vladpetric

I asked Alexey V. Gubin from Klennet about Bcachefs in general and from a recovery point of view and he replied the following:

What do you and others think about his reply? Despite I don't understand everything, I appreciate his opinion. It would be amazing if others here can explain a bit about it.

I really hope Kent gets opinions and analysis from the recovery community too. Despite not veing filesystem designers per se, they can provide very valuable feedback avout data recovery (something I consider very important for a filesystem) and other considerations.

>If the filesystem was encrypted, it is now unrecoverable
this is a good assumption anyways, FDE should only be used when you would rather loose the data then have someone else get it. no pain here IMO

this is the only somewhat interesting bit here bcachefs holds the same problems as other filesystems, meanwhile those other simila filesystems have more mature tooling to recover data. nothing groun dbreaking here, but if you wanted more comment, go an ask more about it on the IRC #bcache they are quite active there and im sure if you ask polite enough you will get a respectable answer.

**vladpetric** · 11 July 2023, 09:05 AM

Originally posted by timofonic View Post

vladpetric

I asked Alexey V. Gubin from Klennet about Bcachefs in general and from a recovery point of view and he replied the following:

What do you and others think about his reply? Despite I don't understand everything, I appreciate his opinion. It would be amazing if others here can explain a bit about it.

I really hope Kent gets opinions and analysis from the recovery community too. Despite not veing filesystem designers per se, they can provide very valuable feedback avout data recovery (something I consider very important for a filesystem) and other considerations.

I also found the folliwng LWN article and comments to be a crap show too. Go abd watch it with the following link:

The rest of the 6.5 merge window [LWN.net]

https://lwn.net/SubscriberLink/937006/b5905509cad38258/

Ok, the only opinion about the above is a rant ...

A filesystem that is meant to be fault-tolerant should always be aggressively tested in an environment with injected errors. As in, one should have a test bench where errors are injected into the FS, and recovery process run, etc. Will that catch all bugs? No, it won't, but it's a good start.

The ostrich approach is not a strategy (no offense to ostriches ... they actually stick their heads in the ground to find stuff to eat btw).

**geearf** · 15 July 2023, 09:28 AM

Originally posted by skeevy420 View Post

I hope it gets tunable compression levels some day since that's the one feature I'm missing from using it. AFAICT, Zstd is 3 by default and you have to patch the code to change that. Call me crazy, but I don't mind limiting some mount points to ~75mbps write speed for zstd-19, but at the same time I don't want to use that as default everywhere because it's slow as hell. ~70-80mpbs is what Steam reports writing when I'm downloading games onto my Steam game storage ZFS dataset that uses Zstd-19 on a 3x8TB RAID. If anyone is curious: ~200mbps write with LZ4 and it reads at ~400-500 regardless of LZ4 or Zstd (the only ones I use).

I'm interested in seeing how LZ4 foreground compression with Zstd-19 background compression works with Bcachefs in the future whenever that becomes possible. I hope OpenZFS picks up foreground/background compression. That's really kickass feature.

Foreground and background compression is something I really want! It makes so much sense, I don't get why btrfs does not have it. Actually it should even be supporting tiered compression, ie when going to the first caching device write fast, but for the final take your time, but caching writes was not a good idea with bcache so not sure if it'll be with fs...

Aside, am I the only one reminded of the Reiser4 merge here?

Announcement

It's Looking Like Bcachefs Won't Be Merged For Linux 6.5

Comment

Comment

Comment

Comment

Comment

Comment

Comment