Announcement

**mdedetrich** · 28 September 2021, 03:33 AM

Originally posted by useless View Post

He explicitly stated that he used bcache underneath. That setup tends to end up in disaster.

And you were blaming his HBA/controller which is not going to be a problem if he is not using one.

Originally posted by F.Ultra View Post

Or perhaps Dalvik ;-)

Again, completely different situation and Oracle was just finishing what Sun wanted to do. Remember the case with .net implementing Java?

Originally posted by Khrundel View Post

And please, stop posting this nonsense.
Facebook uses X is not a valid argument. Facebook is a big company, they can use anything. They can use some DOS machines for something.
And their usage of btrfs doesn't mean btrfs is in good shape. They may know about it is faulty and still use it with care. For example they can use multiple devices btrfs for quick and painless adding of space to VMs, while using some external solution to ensure data integrity.

Btrfs deserves it's bad reputation.

From what I understand, they basically only use BTRFS in a couple of configurations (ergo raid 10) and nothing else. Its other configurations (like Raid5) which btrfs pushed as stable into the kernel tree that were broken (and in some cases even caused data corruption) when they evidently were not.

Since they are using btrfs in data center, they also likely never actually use the same btrfs that stores their data as a boot drive (no one in server environments does this) which was also a historical cause of problems.

People need to understand that just because company X uses something doesn't mean that the product is stable for all use cases, especially companies like Oracle or Facebook when they use such a filesystem its an extremely narrow use case.

**billyswong** · 28 September 2021, 04:14 AM

I think btrfs fanboys should recognize this: no matter how much more features and better functions they claim, btrfs causes more corrupted, unrecoverable systems than good old ext4. It may be the fault of faulty hardware. It may be btrfs developers assumed some standard behaviour when a computer faces power failure but those "faulty" hardware are doing whatever they like. But the truth is ext4 survives them in a far far higher rate than btrfs. If btrfs is designed to be only safe under enterprise grade hardware system, label it so. Else, adapt to quirks and bugs of consumer hardware in general. Or, if its design can't be fixed without breaking backward compatibility, accept its failure.

Let us all hope bcachefs learn from btrfs failure and succeed to avoid all these from ever happening.

**S.Pam** · 28 September 2021, 09:21 AM

Originally posted by birdie View Post

Oh, I love btrfs: https://unix.stackexchange.com/quest...how-to-recover

Parent transid is almost always buggy storage stack that is not respecting barriers.

What it means is that the storage stack did not keep the write order according to the barrier commands, so stuff written before the barrier ends up written after the barrier. If then there is a power outage, bus reset or a hardware reset, the stuff written before the barrier would be lost. This is why the parent transid errors happen.

Barriers are like fsync. And on these cases the storage stack ignored it!

That said, Btrfs could handle this better.

PS. Storage stack is everything between btrfs and the drive, including sata controller, USB bridges, lvm layers, luks, etc.

**Mario Junior** · 28 September 2021, 10:09 AM

Originally posted by flower View Post

The last update on that site was Tue Dec 4 08:07:29 2018

It's hard to know what's going on and what's been done when the developer hasn't updated the site for almost 3 years.

**flower** · 28 September 2021, 10:12 AM

Originally posted by Mario Junior View Post

It's hard to know what's going on and what's been done when the developer hasn't updated the site for almost 3 years.

well it's not mainlined. he prefers to communicate through patreon (public) and reddit.
it's a one man show with no corporate backing. what do you expect? i prefer him to code.

**billyswong** · 28 September 2021, 01:36 PM

Originally posted by S.Pam View Post

Parent transid is almost always buggy storage stack that is not respecting barriers.

What it means is that the storage stack did not keep the write order according to the barrier commands, so stuff written before the barrier ends up written after the barrier. If then there is a power outage, bus reset or a hardware reset, the stuff written before the barrier would be lost. This is why the parent transid errors happen.

Barriers are like fsync. And on these cases the storage stack ignored it!

That said, Btrfs could handle this better.

PS. Storage stack is everything between btrfs and the drive, including sata controller, USB bridges, lvm layers, luks, etc.

The buggy storage stack problem described in birdie's link happened a lot of times already in my current desktop. Very similar. Probably triggered by write commands of Firefox closing not finished yet when the computer is shutdown. Risk lowered if I remember to wait some time between closing Firefox and shutting down the computer. Rescued everytime by fsck to the ext4 drive.

The computer is running ASUS B450 motherboard + Curcial SATA SSD. Not sure which one is the culprit, maybe both.

**kreijack** · 28 September 2021, 02:08 PM

Originally posted by mdedetrich View Post

While this is true the unfortunate issue with btrfs is that it was rushed before it was designed properly (I guess for market reasons) and has historically had a history of putting breaking changes into the Linux kernel tree.

This has caused real problems, for example rushing a filesystem is not a good thing because you really need to make sure you get the on disk format correct from the get go.

This is what also Kent Overstreet told. However he was unable to do better (snapshot comes only now, and it is from the snapshot that the complexity increases).

Anyway, could you be so kindly to list these "breaking changes" ? It seems these these are a lot. But (as BTRFS user from 2009) I didn't remember any. The non backward compatible changes are marked with the NON compatible flags, which is far to be "breaking change".

Originally posted by mdedetrich View Post

Btrfs didn't do this well and hence it has a raid5/6 write hole which it cannot fix without breaking current btrfs volumes.

This is no true. The problem of raid5/6 is that it is really complex to couple a COW filesystem and a RMW block layer. The error was to suppose that the RAID5/6 could be an extension of the RAID1/10... But this is not a problem of the internal btrfs structure.

In fact the only sane way to implement raid5/6 is like zfs does, using a tier storage: put the data temporary in a ssd using a mirror profile, the write the data in a parity profile protected by the cache. Moreover ZFS uses also stripe variable length, which has its downside. There were patches which implement a log for raid5/6 in btrfs.

This obviously increase the complexity.

Hilarious, bcachefs (as derived from bcache) should be ahead btrfs from this point of view. However it still miss the parity profile (which is called erasure coding in bcachefs slang).

Originally posted by mdedetrich View Post

bcachefs is a one man thing, but whats important here is that one man is doing his best making sure things are designed correctly before pushing out to the public. If you read Kent's blogs, you will see that its a deliberate decision from him to get things right first before pushing it onto users.

Btrfs being rushed the way it was means that people like myself won't touch it with a 10 foot pole, at least not any time soon (personally they have eroded trust). For a general desktop OS, if my distro switches to btrfs as a default I probably won't care that much but for any serious data storage I am sticking with zfs but I am keeping an eye on bcachefs because at least the developer is treating it seriously.

I don't know if bcachefs will be better than btrfs. I hope yes, because btrfs is quite good, so having a better filesystem is even more better. However the history says that the develop a filesystem requires a lot of effort; and a 1 men project is unable to provide that.

**flower** · 28 September 2021, 02:17 PM

Originally posted by kreijack View Post

In fact the only sane way to implement raid5/6 is like zfs does, using a tier storage: put the data temporary in a ssd using a mirror profile, the write the data in a parity profile protected by the cache. Moreover ZFS uses also stripe variable length, which has its downside. There were patches which implement a log for raid5/6 in btrfs.

raidz works well without a ssd or any kind of tiered storage. it doesnt write raidz data twice

EDIT: it's not even possible to do that. you can only cache sync writes. normal writes never use any kind of caching

Originally posted by kreijack View Post

However it still miss the parity profile (which is called erasure coding in bcachefs slang).

erasure coding is merged and works. i am not sure how good it is tested and how stable it is though

**kreijack** · 28 September 2021, 02:18 PM

Originally posted by billyswong View Post

I think btrfs fanboys should recognize this: no matter how much more features and better functions they claim, btrfs causes more corrupted, unrecoverable systems than good old ext4. It may be the fault of faulty hardware. It may be btrfs developers assumed some standard behaviour when a computer faces power failure but those "faulty" hardware are doing whatever they like. But the truth is ext4 survives them in a far far higher rate than btrfs. If btrfs is designed to be only safe under enterprise grade hardware system, label it so. Else, adapt to quirks and bugs of consumer hardware in general. Or, if its design can't be fixed without breaking backward compatibility, accept its failure.

My experience is quite different from what are you wrote. I used BTRFS on a very bad hardware: the power supply was unable to sustain the hard-disk sometimes so the HD blocked.
I never lost a filesystem. Sometime a file was corrupted, but it was quite easy to detect it thanks to the checksum.

I can't say if ext4 would have a better outcome, however ext4 is not capable to detect corruption.

I remember that in the beginning ZFS was considered unreliable because it was able to find corruption in a non enterprise grade hd: the reality is that the disk became bigger, cheaper and less reliable; so the likelihood of a corruption increase to the point that it is no a so remote possibility. The likelihood increases in a non enterprise grade hd.

ZFS, btrfs and bcachefs are the only filesystem that can detect these kind of problem.

**flower** · 28 September 2021, 02:54 PM

Originally posted by kreijack View Post

ZFS, btrfs and bcachefs are the only filesystem that can detect these kind of problem.

i use integritysetup in a raid6 mdadm setup atm. this can also detect and repair such failes

Announcement

Bcachefs Merges Support For Btrfs-Like Snapshots

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment