Announcement

**AndyChow** · 10 May 2023, 08:08 PM

Originally posted by stormcrow View Post

A bug list that's "too long" to enumerate is by no means "stable". Just because it gets included in Linus' tree doesn't make it "stable". Linus' tree itself is not stable. It regularly has data corrupting bugs and sometimes subtle errors that throw off calculations.

All that said, I'm more curious what bcachefs brings to the table in the form of minimizing write amplification on SSDs. We already have FS that do the listed features such as CoW, integrity hashing, etc, but only two that I'm aware of specifically address write amplification on flash media. Of those two, one highly recommends battery backed storage because it doesn't cover power loss scenarios since it was designed for smart phones and tablets.

It's based on bcache which has no bugs and is very stable. The author is humble, but there's never been a bcachefs bug that lost data so far.

As for write amplification on SSDs, that's a device issue, i.e. in the controller land. It's not been an issue for a decade and no one expects to have actual view of the real topology. It's why deadline is the best io scheduler for nvme, because we delegate actual writes to the controller. And write amplification is their problem.

**shmerl** · 10 May 2023, 08:54 PM

Finally!

Does it support multiple mountable virtual volumes within one partition similar to btrfs?

**Chugworth** · 10 May 2023, 10:37 PM

Originally posted by illwieckz View Post

You mean a core design problem in RAID5.

What makes btrfs different is that btrfs developers actually loudly warn users about this RAID5 design flaw that is not specific to btrfs but to RAID5.

mdadm RAID5 also has the same issue, because it's an issue specific to RAID5 design, not the the implementation. The workaround provided by mdadm to that RAID5 design flaw is named ppl:

And that workaround is incredibly slow, defeating all the benefits of doing RAID5. If RAID5 requires ppl to be reliable, then no one has any usage for RAID5.

I don't know why this myth of “btrfs having flawed RAID5” is so much alive while it's RAID5 that is flawed to begin with, and that's not btrfs' fault.

If one does a RAID5 without using btrfs purposedly to avoid the flaws that btrfs RAID5 suffers from, that one is very likely running an as-much-flawed-RAID5 but without knowing it.

Illusion of security is worst than lack of security.

The ZFS implementation of RAID 5 and 6 works just fine.

**Mark Rose** · 10 May 2023, 11:22 PM

I doubt the patch series posted will land soon. It's trying to compile functions to memory pages in the kernel and execute them (!!). Apparently individual b-trees in the filesystem have their own packing functions.

And those functions are stored in the same memory pages as the tree data. It seems this was done with the thinking the function code would benefit from data locality. That's true for the L2 cache, but it will result in the same page being stored twice in L1: once in the L1 instruction cache and once in the L1 data cache.

Hopefully there is a limited number of packing functions that can be merged to a single page of instructions.

That's going to take some refactoring to sort out.

**zexelon** · 11 May 2023, 12:43 AM

Originally posted by illwieckz View Post

You mean a core design problem in RAID5.

What makes btrfs different is that btrfs developers actually loudly warn users about this RAID5 design flaw that is not specific to btrfs but to RAID5.

mdadm RAID5 also has the same issue, because it's an issue specific to RAID5 design, not the the implementation. The workaround provided by mdadm to that RAID5 design flaw is named ppl:

And that workaround is incredibly slow, defeating all the benefits of doing RAID5. If RAID5 requires ppl to be reliable, then no one has any usage for RAID5.

I don't know why this myth of “btrfs having flawed RAID5” is so much alive while it's RAID5 that is flawed to begin with, and that's not btrfs' fault.

If one does a RAID5 without using btrfs purposedly to avoid the flaws that btrfs RAID5 suffers from, that one is very likely running an as-much-flawed-RAID5 but without knowing it.

Illusion of security is worst than lack of security.

RAID 5 has certain edge cases like all things but it is not inhereninherently flawed and has been implemented successfully in thousands of products. I have personally run 3ware hardware RAID 5 arrays for years without the slightest issue even when rebuilding from a failed drive.

The implementation in BTRFS is flawed and in the many years they have known this, the feature has never been removed or disabled.. just warned about in their wiki...

**oiaohm** · 11 May 2023, 12:55 AM

Originally posted by Chugworth View Post

The ZFS implementation of RAID 5 and 6 works just fine.

ZFS does not in fact implement raid5 or 6. RAIDZ1 and RAIDZ2 are ZFS closet to RAID 5 and 6 but there are many things RAIDZ1 and RAIDZ2 does that is not RAID5 or RAID6 specification this is why those avoid particular problems.

https://news.ycombinator.com/item?id=8305710

There is a growing problem for anything based around RAID5 and RAID6 ideas as drives grow in size the rebuild time is going up. Simpler duplicate the file solutions have reduced rebuild times.

**waxhead** · 11 May 2023, 01:02 AM

Originally posted by zexelon View Post

RAID 5 has certain edge cases like all things but it is not inhereninherently flawed and has been implemented successfully in thousands of products. I have personally run 3ware hardware RAID 5 arrays for years without the slightest issue even when rebuilding from a failed drive.

The implementation in BTRFS is flawed and in the many years they have known this, the feature has never been removed or disabled.. just warned about in their wiki...

It is also warned about (very clearly) when trying to create a raid5/6 filesystem as well. It is also important to realize that BTRFS' "RAID" is not really RAID in the traditional sense and yes BTRFS' "RAID5/6"-like functionality was up until recently a complete piece of junk in my personal opinion (Yes, I have tested it).

Regular RAID5 works as you say perfectly well, the biggest problem is that people forget that RAID5 protects against *disk failure* and not against data corruption. The same goes for RAID6. (and yes, some top end hardware vendors (3ware most probably) apparently do add some checksum protection on block level).

BTRFS' implementation of RAID is quite different from regular RAID,and that is why BTRFS' "RAID"10 profile is more fragile than regular RAID10 or example. but at the same time potentially easier to recover from. Not to mention the fact that you can have different "RAID" profiles for metadata which really helps.

**oiaohm** · 11 May 2023, 01:17 AM

Originally posted by zexelon View Post

RAID 5 has certain edge cases like all things but it is not inhereninherently flawed and has been implemented successfully in thousands of products. I have personally run 3ware hardware RAID 5 arrays for years without the slightest issue even when rebuilding from a failed drive.

The implementation in BTRFS is flawed and in the many years they have known this, the feature has never been removed or disabled.. just warned about in their wiki...

https://www.attingo.com/raid/raid5/

RAID 5 rebuild cancelled
One of the most frequent causes for data loss from RAID 5 systems is a cancelled rebuild. All of the sectors of every hard drives are being accessed and even one read error can lead to an ejection of the affected hard drive out of the RAID 5 array.

3ware hardware RAID 5 will do this to you. One of the things about ZFS RAIDZ1 is that a single dead sector on any of the other drives you are rebuilding from is not rebuild cancelled. Old school minor raids don't have this evil splat.

Write hole issue that different RAID5 can suffer from due to power outage can also be a fatality yes 3wave hardware raid controllers have that as well. Yes no better than using btrfs RAID5 in fact.

Most case hardware raid don't offer much really. RAID 5 is in lots of cases flawed. The problem of data parity you need to read 2 sectors to reproduce one effectively doubling your odds of finding a defective sector in a rebuild then add in that lot of these raid5 solutions will abort the rebuild and leave you without data access if that happens.

Yes hard problem right mirror raids simple rebuild and dead sectors can be called lost and allow the rebuild to complete with part data loss compared to many RAID5 and RAID 6 solutions where they rebuild fail then need specialist handling to access the non damaged data..

The one thing I like about RAIDZ1 is the include ability to complete rebuild with data loss in case any of the other drivers is failing or had a write issue We need better than RAID5/6. Rebuild cancelled is absolute curse of RAID5/6. Write hole issues with power outage is another absolute curse.

ZFS is not 100 percent immune to write hole issues the difference is the best effort rebuild so data the user can access that is lost is minimized without needing specialist help..

**waxhead** · 11 May 2023, 01:19 AM

Personally I am a BTRFS evangelist and I love and use the filesystem on several boxes including my desktop (Debian testing). I never run into any issues with it except a nasty bug in a non-LTS kernel version (I believe it was 5.2). I was however able to recover almost all my files without much trouble, and BTRFS have saved the day by transparently correcting several silent data corruptions on several boxes, so as long as you RTFM and use LTS kernels then my experience is at least flawless.

That being said I have several issues with BTRFS such as the fact that it is not possible (yet) to have multiple data storage profiles on several subvolumes. For that to work you would have to be able to group storage devices and assign or weight subvolumes to them. Bcachefs seems to support this based on what I have read, and in fact as far as I am concerned Bcachefs seems to (at lest in theory) get a lot of things right.

I see a lot of people complaining that Bcachefs is essentially a one-man project and not backed up by companies. Well, I actually consider that a good thing. With one man you do not have the politics happening when you have a committee deciding what to do. Decisions are getting made and the project moves forward, unlike BTRFS where "everyone" is very careful to not make mistakes or too radical changes and there fore the project moves slowly , but steadily forward.

What really is good for the project depends heavily on where you are in the development phase, and if you think about it it is just perfect that Bcachefs is where it is. I am not going to trust Bcachefs with any data myself (yet), but hope for the best for this filesystem. It will really be interesting to see where it is in another 5 years or so, but then again BTRFS might have progressed to extent tree v2 and addressed some of the issues that Bcachefs tries to do better anyway.

In any case, best of luck Mr. Overstreet - hoping your filesystem will reach mainline some day , I for sure will look forward to testing it!

**oiaohm** · 11 May 2023, 01:43 AM

Originally posted by waxhead View Post

Regular RAID5 works as you say perfectly well, the biggest problem is that people forget that RAID5 protects against *disk failure* and not against data corruption. The same goes for RAID6. (and yes, some top end hardware vendors (3ware most probably) apparently do add some checksum protection on block level.

Evil current Hardware RAID controllers normally only process those extra written checksums when they are rebuilding and when they don't match hello RAID Rebuild cancelled.

Hardware Raid is Dead and is a Bad Idea in 2022

https://www.youtube.com/watch?v=l55GfAwa8RI

Hot take? Maybe? Maybe not? idk I'm just the editor. ~Editor Autumn**********************************Check us out online at the following places:https://link...

Parity computations are the same. The compute write them to disc then never check if the data and Parity are still correct or worse picks up data has change and just updates the parity data silently.

data corruption
Problem is data corruption=you cannot rebuild your raid5/6.

Historic RAID 5/6 did detect data corruption and designed to deal with disk failure. Modern hardware raid 5/6 controllers optimized for speed not for data integrity or means to recover from disk failure correctly.

Data corruption and failure to rebuild Raid5/6 after disk failure is the same thing. This is the problem Raid 5/6 controller really should not sell that it provides protection against disk failure if it does not protect against Data corruption because Data corruptions means you cannot recover from disc failure.

Announcement

Bcachefs Submitted For Review - Next-Gen CoW File-System Aims For Mainline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment