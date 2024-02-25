Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 19+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
Bcachefs Publishes Patches For Disk Accounting Rewrite
Overstreet has been working toward this Bcachefs disk accounting rewrite for a while and this weekend published the initial patch series. He explained of this rewrite:
The old disk accounting scheme was fast, but had some limitations:
- lack of scalability: it was based on percpu counters additionally sharded by outstanding journal buffer, and then just prior to journal write we'd roll up the counters and add them to the journal entry. But this meant that all counters were added to every journal write, which meant it'd never be able to support per-snapshot counters.
- it was a pain to extend this was why, until now, we didn't have proper compressed accounting, and getting compression ratio required a full btree scan
In the new scheme:
- every set of counters is a bkey, a key in a btree (BTREE_ID_accounting). this means they aren't pinned in the journal
- the key has structure, and is extensible disk_accounting_key is a tagged union, and it's just union'd over bpos
- counters are deltas, until flushed to the underlying btree this means counter updates are normal btree updates; the btree write buffer makes counter updates efficient.
Since reading counters from the btree would be expensive - it'd require a write buffer flush to get up-to-date counters - we also maintain a parallel set of accounting in memory, a bit like the old scheme but without the per-journal-buffer sharding. The in memory accounters indexed in an eytzinger tree by disk_accounting_key/bpos, with the counters themselves being percpu u64s.
This breaks the on-disk format for the file-system and thus needs to regenerate accounting when upgrading (or downgrading) past this new version. This should happen transparently via kernel fsck but there is some known limitations for Linux 6.7 users at the moment. The hope is to potentially have this disk accounting rewrite ready for Bcachefs in Linux v6.9.
More details on this new code via the patch series.