Announcement

**Khrundel** · 27 September 2021, 03:47 PM

Originally posted by useless View Post

You mean: recreate missing data/metadata from thin air?

Well, I would be pretty happy if btrfs check wouldn't removed my files. It wasn't fault which removed them, it was btrfs check. It even have written tons of lines like "removing dirent N".

Originally posted by useless View Post

The thing didn't write to safe storage what was supposed to. You were using bcache which, to the extent of my memory, can have disastrous consequences if things go south.

Maybe. Maybe not. Hardware can always fail, especially when filesystem can span over several devices, as btrfs can. Especially for filesystem which was advertised as created to preserve old data and able to duplicate it. What is the point in having snapshot and RAID when checking utility just removes large part of data and still can't restore valid state?

Originally posted by useless View Post

By the way, if it doesn't have a backup it's not important.

Luckily, I haven't lost something really important. I've lost several oneliner scripts written years ago, almost all my steam library, some of these games don't support cloud saves. But, imagine for example some local source code copy, which wasn't pushed for a week. Is it ok to loose it? Or everybody should do daily backups? Backups are intended to be last line protection, you shouldn't restore from them every time power goes down.

So, btrfs is just a garbage. Red hat was right, it's pity I haven't listen to them.

**Khrundel** · 27 September 2021, 04:04 PM

Originally posted by pkese View Post

The first version of btrfs that got merged into kernel was v0.17 and that was kernel 2.6.29, back in January 2009.
The log goes back to April 2008 when btrfs v0.13 already had: COW, checksumming, transactions, snapshots, subvolumes...

So, until bcachefs merged into mainline you can't compare it feature readiness with btrfs?

Originally posted by pkese View Post

But a wider answer regarding "day 1" issue is that in copy-on-write-b-tree system, snapshotting is part of the architecture.
First you start with snapshots for each transaction. Then you add the code to delete all snapshots, that weren't meant to stay around.

Maybe btrfs did it that way, but does it mean it is the only way? Or is it a correct way? Btrfs is still broken, it is extremely fragile. Do they at least fixed ext4->btrfs conversion?
I think bcachefs was named so for a reason. I assume it was designed to fix some problem with btrfs over bcache. Bcache doesn't understand what concrete block holds so, for COW filesystem, it can think it should cache some hot block while in reality this file had been overwritten and now this space belongs to snapshotted data. No wonder Kent have started from tiered storage support.

**mdedetrich** · 27 September 2021, 04:08 PM

Originally posted by kreijack View Post

But I think that this have to be read read in the right context: iirc BTRFS was sponsored by Oracle from the beginning; now it is supported by redhat, suse, WD, facebook; instead bcachefs is still without a sponsor. I think that Kent Overstreet is a very good developer, but it is only one and this is not enough. The today standard requires that a filesystem cannot be developed by only one men.

While this is true the unfortunate issue with btrfs is that it was rushed before it was designed properly (I guess for market reasons) and has historically had a history of putting breaking changes into the Linux kernel tree.

This has caused real problems, for example rushing a filesystem is not a good thing because you really need to make sure you get the on disk format correct from the get go. Btrfs did not do this well and hence it has a raid5/6 write hole which it cannot fix without breaking current btrfs volumes.

bcachefs is a one man thing, but whats important here is that one man is doing his best making sure things are designed correctly before pushing out to the public. If you read Kent's blogs, you will see that its a deliberate decision from him to get things right first before pushing it onto users.

Btrfs being rushed the way it was means that people like myself won't touch it with a 10 foot pole, at least not any time soon (personally they have eroded trust). For a general desktop OS, if my distro switches to btrfs as a default I probably won't care that much but for any serious data storage I am sticking with zfs but I am keeping an eye on bcachefs because at least the developer is treating it seriously.

**ferry** · 27 September 2021, 04:17 PM

Instead of adding snapshots to bcachefs I would prefer to finally see 'hot relocate` added to btrfs. This was long advertised as a future feature for btrfs and is potentially much more efficient then caching as files get written to the correct disk (ssd) right away and get moved out to rotating disk when needed (large or seldom used files). What ever happened to IBM's patches for this feature?

**maffblaster** · 27 September 2021, 04:19 PM

Originally posted by Tuxie View Post

My home file server uses a combination of Bcache, LUKS, XFS and mergerfs, with SnapRAID for asynchronous redundancy/parity. A single SSD provides read and writeback block cache for 10 HDDs, and SnapRAID will snapshot parity for those 10 disks onto 2 parity disks every 6 hours. The upside of this setup over conventional (including ZFS) raid is that most HDDs will be spun down most of the time, meaning they are silent, use little power and generate little heat. Thanks to Bcache, they don't even spin up for ls or find most of the time, because the metadata blocks are on the SSD most of the time. Another upside is that I can mix HDD sizes and make full use of all of them, given that the parity disks are at least as large as the largest data disk. For read-mostly file systems that are mainly used for archival, it's a perfect compromise! I have a separate SSD-only filesystem for highly volatile data.

I've love to see a diagram and some instructions for your setup!

**useless** · 27 September 2021, 04:25 PM

Originally posted by Khrundel View Post

Well, I would be pretty happy if btrfs check wouldn't removed my files. It wasn't fault which removed them, it was btrfs check. It even have written tons of lines like "removing dirent N".

Uh, another case of RTFM? btrfs-check, and restore.

Originally posted by Khrundel View Post

Maybe. Maybe not. Hardware can always fail, especially when filesystem can span over several devices, as btrfs can. Especially for filesystem which was advertised as created to preserve old data and able to duplicate it. What is the point in having snapshot and RAID when checking utility just removes large part of data and still can't restore valid state?

I think we are going in circles over here. It can't recover what's not there. Your storage stack did nasty things and you expect btrfs to recover magically? Jeez.

Originally posted by Khrundel View Post

Luckily, I haven't lost something really important. I've lost several oneliner scripts written years ago, almost all my steam library, some of these games don't support cloud saves. But, imagine for example some local source code copy, which wasn't pushed for a week. Is it ok to loose it? Or everybody should do daily backups? Backups are intended to be last line protection

If it's not backed up, it's not important. Simple as that.

Originally posted by Khrundel View Post

you shouldn't restore from them every time power goes down.

If you have a corrupted filesystem after a power down go yell the drive/controller/motherboard manufacturer. It's the most battle tested feature of btrfs: every transaction is atomic (assuming no faulty storage stack).

Originally posted by Khrundel View Post

So, btrfs is just a garbage. Red hat was right, it's pity I haven't listen to them.

Redhat didn't invest in btrfs because the have already invested in another solution. They employed no btrfs developer.

**skeevy420** · 27 September 2021, 04:52 PM

Originally posted by brucethemoose View Post

Thats quite a change, isn't it? BTRFS is a high overhead, "everything and the kitchen sink" type FS, while f2fs and (probably) bcachefs are low overhead, more barebones type fs.

That being said, 5.15 f2fs has given me pretty much everything I would want from a simple flash FS.

Nah. Bcachefs does pretty much all of the things I use with ZFS and BTRFS. F2FS, OTOH, is just a simple FS with compression which is great for both root and portable storage.

**Tuxie** · 27 September 2021, 05:13 PM

Originally posted by maffblaster View Post

I've love to see a diagram and some instructions for your setup!

Originally posted by atmartens View Post

You should publish a guide.

I don't have all the exact commands ready, but here is the gist of it. Very quick and dirty hacked together guide, you'll need to RTFM to fill in the gaps

First, all disks including the cache ssd are formatted with cryptsetup luksFormat -d /etc/luks.pwd /dev/sdX where /etc/luks.pwd is the passfile (a sha256 hash of some random data), and they are all added to /etc/crypttab like
ssd01_crypt UUID=xxxxxxxxxxxx /etc/luks.pwd luks,discard
hdd01_crypt UUID=xxxxxxxxxxxx /etc/luks.pwd luks
hdd02_crypt UUID=xxxxxxxxxxxx /etc/luks.pwd luks
Start them with cryptdisks_start hdd01_crypt etc.

Then the cache SSD is formatted with make-bcache --cache --writeback --discard /dev/mapper/ssd01_crypt and the data HDDs with make-bcache --bdev /dev/mapper/hdd01_crypt etc. Attach each bdev to the cache by echoing the UUID of the cache dev to /sys/block/dm-*/bcache/attach. You'll find the UUID in /sys/fs/bcache/.

Format all the /dev/bcacheN devices with mkfs.xfs /dev/bcache0 or your filesystem of choice and add them by their UUID to /etc/fstab. I mount them as /raw/hdd01 etc. My fstab looks like (for the first 3 disks):
UUID=191f1a4d-5885-49dd-8b68-ebdc234e1078 /raw/hdd01 xfs noatime 0 0
UUID=f4e1f378-1253-4dd6-a1f1-fdfa70b258f0 /raw/hdd02 xfs noatime 0 0
UUID=b9537fc9-1a68-41b5-af37-6e1e209522da /raw/hdd03 xfs noatime 0 0

Between all steps, make sure to look at lsblk -f to see which disk belongs to which bdev and crypt dev and to find the UUIDs. You may need to run udevadm trigger some times to make changes visible without rebooting.

In every /raw/hdd*/ I have created a directory named "pool" which is the root of the mergerfs pool. After a few years of tweaking and debugging weird behaviours of some programs when using the pool, I have landed on this mergerfs config (in /etc/fstab), which is optimized for compatibility rather than performance. For me, this config Just Works for everything I use it for and the performance is Good Enough. YMMV.

Code:

[FONT=Courier New]/raw/hdd*/pool          /pool           fuse.mergerfs    allow_other,use_ino,noatime,moveonenospc=true,ignorepponrename=true,link_cow=true,cache.files=off,dropcacheonclose=true,category.search=newest,xattr=passthrough,noforget,security_capability=true,posix_acl=false,minfreespace=50G,category.create=msplfs,hard_remove,fsname=mergerfs,_netdev 0 0[/FONT]

Then finally create an /etc/snapraid.conf with your /raw disks like:
data hdd01 /raw/hdd01/pool/
data hdd02 /raw/hdd02/pool/
data hdd03 /raw/hdd03/pool/
parity /raw/hdd11/snapraid.parity
content /raw/hdd11/snapraid.content
and add some exclude-rules for incomplete downloads, tempfiles and similar.
Then schedule snapraid sync in crontab to run however often you want.

BTW, I use disk -> bcache -> luks -> xfs -> mergerfs myself, but this guide is for disk -> luks -> bcache -> xfs -> mergerfs, which is how I would do it if I were to start over. Also make sure that the root filesystem where you put /etc/luks.pwd is encrypted!

**Khrundel** · 27 September 2021, 05:38 PM

Originally posted by useless View Post

Uh, another case of RTFM? btrfs-check, and restore.

That is bullshit. You can't seriosly suggest "just copy all data to another drive, reformat and restore" as a solution for a minor FS corruption. Do you understand that whole filesystem copy of modern HDD takes 2-10 hours?

Originally posted by useless View Post

I think we are going in circles over here. It can't recover what's not there. Your storage stack did nasty things and you expect btrfs to recover magically? Jeez.

I don't know why you going in circles. As I've already have written, files was deleted by repair tool. No magic required, just don't do additional harm.

Originally posted by useless View Post

If it's not backed up, it's not important. Simple as that.

Yeah. Sure. "Four legs good two legs baaad". I've asked you a question. Is it ok to lose 2-3 days work due to filesystem corruption? How frequently btrfs users have to backup all their important data? Once a day? Once an hour?

Originally posted by useless View Post

If you have a corrupted filesystem after a power down go yell the drive/controller/motherboard manufacturer. It's the most battle tested feature of btrfs: every transaction is atomic (assuming no faulty storage stack).

Why to yell at someone else if it is btrfs' fault? This system is garbage. 8 years of "production ready" state and it still have no decent repair, which won't corrupt system more. What will be next excuse? "Just buy an UPS already"?

Originally posted by useless View Post

Redhat didn't invest in btrfs because the have already invested in another solution. They employed no btrfs developer.

Or maybe they just can't tell their customers "if it's not backed up, it's not important" every time btrfs failed. Something tells me this is more probable cause than "they don't want to pay salary for one additional programmer".

**You-** · 27 September 2021, 06:58 PM

Originally posted by mdedetrich View Post

Completely irrelevant with respect to ZFS, just like with solaris and libreoffice the open source community has forked these libraries/applications and they are now managed by open source community without license issues pertaining to Oracle.

The issue with ZFS is that it's license is incompatible with GPL 2, it has nothing to do with Oracle potentially suing ZFS.

You are not in breach of any licensing terms when using libreoffice.

You are when using ZFS. The licence was deliberately chosen to be incompatible with that of the linux kernel. Remember, Oracle own the code, so if they wanted to kill any ambiguity, it could be licensed to be GPL compatible tomorrow. Or they could choose to include the code in their fork of RHEL (Oracle unbreakable linux) which would hint (but not guarantee) that they dont intend to sue users

It may not matter much to home users, but if you are a company that is profitable, it is a heck of a risk to take.

I get that people like the software. but if I needed to use it, I would use it on a BSD where Oracle could not sue for licence infringement.

Announcement

Bcachefs Merges Support For Btrfs-Like Snapshots

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment