Announcement

**CommunityMember** · 15 January 2024, 06:43 PM

Originally posted by coder View Post

I'm surprised to hear folios reducing performance. I thought the underlying motivation was to improve it!

With almost any change, there are almost always edge cases that degrade performance. Sometimes those can be addressed later, and sometimes the edge cases are acknowledged to be just edge cases (and rare in real world) and accepted in order to gain the improvements in other (more common) cases.

**coder** · 15 January 2024, 06:46 PM

Originally posted by CommunityMember View Post

With almost any change, there are almost always edge cases that degrade performance. Sometimes those can be addressed later, and sometimes the edge cases are acknowledged to be just edge cases (and rare in real world) and accepted in order to gain the improvements in other (more common) cases.

I understand that much, but are you saying the performance regressions are only in edge cases?

I'm just curious whether my understanding was incorrect, because I had been expecting folios to be a significant net win for performance.

**waxhead** · 15 January 2024, 07:27 PM

Originally posted by varikonniemi View Post

Why was this merged with few percent performance degradation? The old model was not even deprecated!

Somehow i get the feeling that this change is needed to push forward their work of finally fixing up the raid hole in the future. But if it's so, it should not be merged before to hide the nastiness the raid hole fix necessitates. Or alternatively, if the "large folio" work coming will fix the performance issue, it should have been merged all at once.

Just speculating, but maybe it was merged because it would have to be done at some point in the future anyway and that would create even more work.
I doubt folios would be much help fixing the "raid"5/6 write hole. Folios are (as far as I understand) "just" fancy pages, which minimum PAGE_SIZE large, the power of two and aligned to power of two and all data is contiguous, plus some other magic witchcraft that makes it useful like LRU, refcount and usecount.

As for fixing the write hole the RAID_STRIPE_TREE is the key to that. Last time I checked "RAID"5/6 in BTRFS is actually not (contrary to popular belief) copy on write. It is read-modify-write. This was done for performance reasons apparently, and I believe something was improved in kernel 6.2 to make the read-modify-write non- or less-destructive.

The ultimate fix for the "RAID"5/6 may actually require implementing something horrible, something so vile that even the most dedicated BTRFS fanboys (like me) should have nightmares about it even in full daylight. Yes, it is no more less disturbing than a GARBAGE COLLECTOR. In fact Josef Bacik has in his extent-tree v2 plans created a garbage collection tree. His blog is from 2021 so I sincerely hope he has had bad dreams about this too and luckily he dislike garbage collection as much as any other sane being, so in any case it will hopefully not be implemented like your average slow, latency hungry garbage collector but as a much smarter concept that traverse the garbage collection tree and collects a little by little unless the filesystem is nearly full, in which case it would have to empty out the garbage in larger batches.

I also agree with you that all the folio patches should ideally have been merged at once, but I can also understand that they want to tread carefully with such a change.

**coder** · 15 January 2024, 09:43 PM

Originally posted by waxhead View Post

... a GARBAGE COLLECTOR. In fact Josef Bacik has in his extent-tree v2 plans created a garbage collection tree.

I kinda thought BTRFS already had something like that. At least, I've seen background processes churning away which seem vaguely similar.

**varikonniemi** · 16 January 2024, 07:26 AM

Originally posted by waxhead View Post

The ultimate fix for the "RAID"5/6 may actually require implementing something horrible, something so vile that even the most dedicated BTRFS fanboys (like me) should have nightmares about it even in full daylight. Yes, it is no more less disturbing than a GARBAGE COLLECTOR. In fact Josef Bacik has in his extent-tree v2 plans created a garbage collection tree. His blog is from 2021 so I sincerely hope he has had bad dreams about this too and luckily he dislike garbage collection as much as any other sane being, so in any case it will hopefully not be implemented like your average slow, latency hungry garbage collector but as a much smarter concept that traverse the garbage collection tree and collects a little by little unless the filesystem is nearly full, in which case it would have to empty out the garbage in larger batches.

That is what bcachefs started out with, but the first in kernel version AFAIK had already switched to journal-style "postprocessing" where it does not just blindly scan the FS for work.

edit:

Before, copygc had to periodically walk the entire extents + reflink btrees; now
it just picks the next-most-empty bucket and moves all the extents it contains.

**fguerraz** · 29 February 2024, 04:01 AM

Originally posted by waxhead View Post

fixing the "raid"5/6 write hole

I know it's often quoted as a BTRFS problem, but it's just a software RAID problem, it's not fixable.
This is a good read.

**waxhead** · 02 March 2024, 08:31 AM

Originally posted by fguerraz View Post

I know it's often quoted as a BTRFS problem, but it's just a software RAID problem, it's not fixable.
This is a good read.

Actually, the write hole is not just a software problem and it can be fixed, or perhaps I should say it can be handled gracefully. For example by introducing a write intent journal/bitmap which will be awfully slow or rely on the raid stripe tree (to avoid RMW) and update pointers when all is written.
Incidentally the write hole do in fact also exists on "RAID"1/"RAID"10 *if* the NOCOW attribute is set as well.

And it is often quoted as a BTRFS problem because it is. BTRFS' do not handle the write hole very well due to RMW (or at least it used to - some fixes where added for "RAID"5, but not yet "RAID"6). Remember that unlike other implementations that may ignore the write hole (and thereby may introduce corrupted data) BTRFS catches the problem and should attempt to fix it.

Regardless of what filesystem and hardware solution is being used, our good friend Murphy usually ruins it all so tested, working backups are essential for data you really care about. And when you think about it , it all comes down to minimizing risk as avoidance it is usually not possible.

PS! Speaking about Murphy's law. If it is true that "Anything that can go wrong will go wrong" it actually means that the law itself will be wrong at some point, so the law is actually self contradictory which pleases me!

Announcement

Btrfs In Linux 6.8 Transitions Metadata Processing To Using Folios

Comment

Comment

Comment

Comment

Comment

Comment

Comment