Btrfs "Reserve Flush Emergency" Feature Heading To Linux 6.2
Josef Bacik who authored the change explained:
Inside of [Facebook], as well as some user reports, we've had a consistent problem of occasional ENOSPC transaction aborts. Inside FB we were seeing ~100-200 ENOSPC aborts per day in the fleet, which is a really low occurrence rate given the size of our fleet, but it's not nothing.
So introduce a new flushing state, BTRFS_RESERVE_FLUSH_EMERGENCY. This gets used in the case that we've exhausted our reserve and the global reserve. It simply forces a reservation if we have enough actual space on disk to make the reservation, which is almost always the case. This keeps us from hitting ENOSPC aborts in these odd occurrences where we've not kept up with the delayed work.
Fixing this in a complete way is going to be relatively complicated and time consuming. This patch is what I discussed with Filipe earlier this year, and what I put into our kernels inside FB. With this patch we're down to 1-2 ENOSPC aborts per week, which is a significant reduction. This is a decent stop gap until we can work out a more wholistic
solution to these two corner cases.
The corner cases where they were hitting these issues were with delayed allocation and deayed refs reserve. More details within this patch that is now part of Btrfs' "for-next" branch ahead of the Linux 6.2 merge window. But long story short it's good news if you have been challenged by Btrfs out-of-space (ENOSPC) transaction abort problems with the file-system.