Announcement

**timofonic** · 04 November 2023, 02:43 PM

According to Kent Overstreet in Bcachefs matrix chatroom:

Something is way off with phoronix's setup. In my testing Bcachefs comes out faster than btrfs and both are quite a bit faster than Phoronix's numbers.

Oh, i think a lot of why the phoronix numbers were so low. It is because CONFIG_BCACHEFS_DEBUG_TRANSACTIONS is on by default. It is a brilliant idea for how we can make all our btree_trans objects accessable for debugfs without them being on a single contended list.

Michael

**Michael** · 04 November 2023, 02:57 PM

Originally posted by timofonic View Post

According to Kent Overstreet in Bcachefs matrix chatroom:

Michael

I didn't enable any debug options myself. If they are shipping with debug flags on by default that may explain some of the differences as mentioned everything was at the default / out of box value. Kent was emailing me yesterday but didn't bring it up then but if that's the case I'll check with him to see what his plans are if debug flags by default for the long term.

**timofonic** · 04 November 2023, 05:37 PM

Originally posted by Michael View Post

I didn't enable any debug options myself. If they are shipping with debug flags on by default that may explain some of the differences as mentioned everything was at the default / out of box value. Kent was emailing me yesterday but didn't bring it up then but if that's the case I'll check with him to see what his plans are if debug flags by default for the long term.

Nice! Thanks! It would be amazing if you interview him too!

**evil_core** · 08 November 2023, 07:44 AM

Originally posted by skeevy420 View Post

I wish I wasn't so invested in OpenZFS so I could feel a little happier about this. Practically every single thing that everyone in this thread is excited to get is something that's been available in ZFS for a long time. That makes me a bit jaded here.

BTRFS users outta be creaming their pants so bad today that there'll be a noticeable increase in jean sales. Market analysts will think it has to do with holidays and sales, but us Linux users will know what's really up.

Since I didn't see it mentioned, Bcacefs does have one big feature that OpenZFS doesn't -- background compression. Being able to set up the one-two combination of LZ4 and Zstd:15 is a kick ass feature to have and something I'm a bit jealous of not having.

RTFM and stop being ignorant / spread misinformation.

BCacheFS is tiered filesystem and ZFS never supported it.
OpenZFS added Special Allocation Class long time after BCacheFS added it. So now it allows storing metadata and small blocks on dedicated faster storage, but it's far cry from what BCacheFS offers(tiering, including background and foreground caching).

ZFS also doesn't allow offline deduplication, defragmenatation, restripe, etc. And all that's supported from long time even on BTRFS.

ZFS is more stable, but it's also old design, not as flexible as more modern COW filesystems.

**janos666** · 14 November 2023, 10:46 AM

I never really understood bcachefs. As it's name and description suggest, it's basically "yet another b-tree filesystem with 'special device caching'".
Why not add patches to Btrfs with 'special device caching'? Does the currently 'written in stone' parts/principles of Brtfs make that extremely difficult / virtually impossible?
Or, as an alternative, make a kernel patch that allows caching filesystem metadata in either regular swap or on a special swap (e.g. a "special device"), so all filesystem can benefit.
I tried forcing metadata caching for HDD-based Btrfs by upgrading the RAM from 8GB to 64GB and setting vfs_cache_pressure to 0 and running a nightly scrip that forces reading all filesystem metadata. It was a visible improvement. (But I am currently back to 8GB because the cheap RAM I bought turned out to be faulty, I got frequent ECC Corrected Errors in dmesg. Even then, the faulty 64GB RAM was still more expensive than a second-hand 80GB M.2 Optane drive would cost. And I guess even regular NAND would do just fine here, caching metadata for HDD-based Btrfs, so you could just have a partition reserved on a single NAND drive serving as the root filesystem as well...)

Originally posted by evil_core View Post

ZFS also doesn't allow offline deduplication, defragmenatation, restripe, etc. And all that's supported from long time even on BTRFS.

I tried and discarded 'OpenZFS on Linux' for my home server mainly due to the reason it never implementing online defragmentation (as I gathered, this would theoretically be possible but require way too much effort, at least doing it without breaking something else and this always had a lower priority while implementing competing features, making defragmentation and restriping more and more difficult to implement on top).

But I also can't understand why Btrfs doesn't allow the RAID stripe size to be changed (at least at compilation time via kernel menuconfig).
For God's sake, even Microsoft Storage Spaces allows the user to configure the stripe size to, say, 4*16 kb for 5 pieces of 4kn disks in RAID-5 and use 64kb as ReFS filesystem allocation unit size to avoid read-modify-write and going through a slow single-disk-speed "intent log" thus work around the "write hole" issue while getting the read/write speed of 4 raw disks for a 5 disk RAID-5 array.
Instead, Btrfs just marks RAID5/6 as "don't use" and leaves people free to disregard Brtfs RAID5/6 due to "not write-hole safe". I think somebody came up with patches for Btrfs RAID5/6 that essentially mimicked what Microsoft Storage Spaces and/or potentially ZFS (depending on it's settings) does for RAID5/6 in most cases (= default configuration) and decided it was way too slow ("Who would have thought...?"). However, an "intent log" is not necessary in the first place if you avoid read-modify-write on the stripes by decreasing the stripe size and/or increasing the filesystem allocation unit size (and feal-size where applicable), things just become regular copy-on-write.

Now that I wrote this above... Would this below work for Btrfs?
- rebalance the array to RAID-0 [for data, keep RAID-1 for metadata and system] (after doing a fresh backup of the important things)
- manually patch the kernel code to set Btrfs stripe size to 16k (I think the current fix value is 128k).
- rebalance the 5-disk array to RAID-5 (still keep RAID-1 for metadata and system, just in case)
Or would it need some extra modifications to bypass read-modify-write of stripes? (I remember Microsoft had to make some changes to StorageSpaces to automatically trigger the "intent log bypass" mode in this case. But if Microsoft did it, then why couldn't Linux implement the same idea...?)

**shtripok** · 19 November 2023, 03:26 PM

Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post

It can scale down pretty well...

ZFS on a single core RISC-V hardware with 512MB (Sipeed Lichee RV D1) - Andreas' website

https://andreas.welcomes-you.com/zfs-risc-v-512mb-lichee-rv/

Is this feasible? Yes it is! OpenZFS on Linux compiled from sources for RISC-V, it will work with less than 250MB of RAM for a nearly fully filled 2TB zfs volume with 225.000 files! Even stressing the system and run in parallel (via Wifi): a…

A If the kernel needs it for something else it will take it.

In reality, even in Solaris, if the memory used for the zfs cache becomes necessary, for example, for the database server — then the server does not have time to receive it in time, and receives a refusal to allocate memory.

**shtripok** · 19 November 2023, 03:32 PM

Originally posted by timofonic View Post

Why so much offtopic about Btrfs and ZFS? Is it some kind of conspiracy?

Because it is based on them that people form their expectations for the modern file system today, and it is based on them that the most interesting ways of using advanced FS functionality are now built.

Announcement

Bcachefs Merged Into The Linux 6.7 Kernel

Comment

Comment

Comment

Comment

Comment

Comment

Comment