Announcement

**kdemello1980** · 05 May 2022, 06:31 PM

I use ZFS, because it supports the features I want. HUGE cache in memory, which gives it a 10-fold increase in some benchmarks, and improves in all cases. Also, RAID5. That said, there are disadvantages. I have to be wary of what kernel versions I use, or stick with what openSUSE leap uses with the corresponding openZFS module. BTRFS being built in the kernel tree would make life much simpler, since I like to build my own kernels and stay as new as what the NVIDIA driver allows.

Maybe some day btrfs will catch up. It's nice to see they're making real progress.

**plantroon** · 06 May 2022, 09:22 AM

Originally posted by kreijack View Post

Could you elaborate a bit ? It seems to me that the BTRFS raid1 is quite good in handling the raid1

1) create a raid1 filesystem from scratch:

Code:

# mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

2) transforming an existing filesystem adding a 2nd disk and changing the profile to a raid1

Code:

# btrfs dev add /dev/sdb /mnt/btrfs-filesystem
# btrfs bal start -dconvert=raid1 -mconvert=raid1 /mnt/btrfs-filesystem

What is cumbersome is to understand which is the current profile....

I find this way simpler (and I learned ZFS when I had way less knowledge about system administration):

1) create a mirror (raid1) pool from scratch:

Code:

# zpool create mypool mirror /dev/sda /dev/sdb

2) transforming an existing pool adding a 2nd disk:

Code:

# zpool attach mypool /dev/sdb

To me the btrfs concepts seem like something "extra"/"on top" and ZFS seems way simpler. Of course, these are just 2 commands, there's a lot more to this.

About RAID1 - it is confusing to me that when you list mounts on a system, reference is only to one of the disks. I never encountered a failure of the disk in my time using btrfs but I've heard stories of how it just spams the console with errors or remounts disk as ro. I don't know which of this is true or if it's "only cosmetic".

The mountpoints in btrfs were hard to understand for me, the whole subvol structure and how it's numbered (but I didn't need to know this all I guess..)

**kreijack** · 06 May 2022, 01:58 PM

Originally posted by plantroon View Post

I find this way simpler (and I learned ZFS when I had way less knowledge about system administration):

1) create a mirror (raid1) pool from scratch:

Code:

# zpool create mypool mirror /dev/sda /dev/sdb

This command to me seems equivalent to the btrfs one

Originally posted by plantroon View Post

2) transforming an existing pool adding a 2nd disk:

Code:

# zpool attach mypool /dev/sdb

Where you define how change the filesystem raid profile ? In btrfs after you add a disk, you can reshape the filesystem changing the raid profile: eg. if you are adding a 2nd disk, you can switch the filesystem in RAID1 or leave it as a "simple" concatenation of disk.
More options exist if you use more disks (e.g. raid10 or raid1c3/raid1c4....).

My understanding is that the flexibility of btrfs (e.g. how simple is changing the filesystem raid profiles) require a bit of know how because you can face complex case: e.g. removing a disk from a 3 set of three disks of a raid1 filesystem where the size of the disks are different.

This is a complex case that, depending by the disks occupation, may be "easy" to manage or requires several steps of "balacing". But this is not a btrfs failure, the other filesystems don't have this kind of problem because very few allow the disk removing and the balance of data with reshaping of profile on the fly....

Originally posted by plantroon View Post

To me the btrfs concepts seem like something "extra"/"on top" and ZFS seems way simpler. Of course, these are just 2 commands, there's a lot more to this.

About RAID1 - it is confusing to me that when you list mounts on a system, reference is only to one of the disks. I never encountered a failure of the disk in my time using btrfs but I've heard stories of how it just spams the console with errors or remounts disk as ro. I don't know which of this is true or if it's "only cosmetic".

These are two different issues:
- if you use the "btrfs dev us <mntpnt>" or "btrfs fi us <mntpnt>" commands, you can see which devices are involved.
- regarding how manage a raidX failure, we need to understand better the case.

Originally posted by plantroon View Post

The mountpoints in btrfs were hard to understand for me, the whole subvol structure and how it's numbered (but I didn't need to know this all I guess..)

On this I agree that sometime the btrfs commands are more a raw output of the data behind the filesystem than a summary of the status of the filesystem (and on this I have my faults as the original author of the btrfs command :-) ).

**waxhead** · 07 May 2022, 05:48 AM

What seems to be missed again and again is that BTRFS "RAID" is not really RAID in the traditional sense. That is why BTRFS allows to use storage devices of different size.

Here is an analogy for you all (no this is not intended to be spot on accurate - I am just trying to explain the concepts).

Think of your storage device as a loaf of bread, and butter (-fs) as the data. BTRFS can spawn several loafs of bread (storage devices) of different sizes:
As BTRFS needs more surface to store it's data it cuts the bread into pieces (allocates a chunk) and smears the data (butter) across that piece.
If you have more butter than can fit on one piece you cut a new piece of bread to make room for all the butter you have.

As data is deleted you remove butter from the pieces of bread that contain the butter (data). The result is that the bread pieces gets spotty with butter.
There might not be room for a large slob of butter on your piece of bread anywhere without covering existing butter.
To consolidate all the butter into one smooth surface you can scrape off the butter and try to cover the other pieces fully (that is balancing). Now you may have room to fit that pesky slob of butter on one of the clean pieces of bread.
(Yes this is similar to defragmenting, but this is on chunk level - not file level so it is not really the same)

Roughly speaking - that is how it works. Now over to explaining what "RAID" means in BTRFS terms - and that is essence only describing instances of your data.
And again - the below explanation is not 100% accurate , but it should be close enough to grasp the concepts.

SINGLE:
- One type of butter. Can be put on any loaf of bread (storage device)
DUP:
- Two types of butter containing the same data, but of different color for example blue and yellow (yay Ukraine).
- Each bread must contain both colors (on different pieces of bread).
- 2x instances of your data on the SAME storage device.
RAID0:
- One type of butter, smeared over pieces of bread from as many loafs of bread as possible e.g. one piece of bread from each loaf spread on the desk like a deck of cars. Then smear butter on top of all of them.
- 1x instance of your data spread over as many storage devices as possible (for speed)
RAID1
- Two type of data/butter (blue and yellow) containing the same data, but the different colors can not be mixed on the same load of bread.
- Only 2x instances of data regardless of how many storage devices, Each unique instance must contain a copy on another storage device.
RAID1c3
- Only 3x instances of your data, regardless of how many storage devices... Otherwise as RAID1
RAID1c4
- Only 4x instances of your data, regardless of how many storage devices... Otherwise as RAID1
RAID10:
- Same as RAID0 but with another instance / copy. Not as deterministic as regular RAID10 so having half of the storage devices on controller 1 and half of the storage devices on controller 2 does not imply that you can loose one controller and still have the thing running. BTRFS RAID10 can (most likely) *only* loose one device before redundancy is compromised. The same goes for regular RAID10 as well, but usually it can still work if you loose other storage devices on the same "side"
- Only 2x instances of your data , regardless of how many devices.
RAID5
- Same as RAID0 , but with one piece of bread used for parity.
- Only 1x instance of your data, but one parity block as well (located on any piece of bread on any loaf)
RAID6
- Same as RAID5 , but with another piece of bread used for parity
- Only 1x Instance of your data ,but with TWO parity blocks (each located on any piece of bread from two different loafs).

Hope this helps most of you interested in BTRFS get a grip on the very basics fundamentals.

The BTRFS documentation can be found here : https://btrfs.readthedocs.io/en/latest/
And the BTRFS disk space calculator has improved quite a bit - so have a look here too for something fun to play with : https://carfax.org.uk/btrfs-usage/

**BOYSSSSS** · 07 May 2022, 02:38 PM

Originally posted by ferry View Post

As a long time btrfs user I can say we don't care about RAID5/6, but we are still waiting for the promised hot-relocation support (mixed SSD/HDD with hot data automatically being relocated to the SSD). I still have IBM's patches lying around for that, what happened to that effort?

I have a feeling bcachefs will be upstreamed and stable long before that ever happens. I've gotten very used to btrfs, but still can't wait to tryout bcachefs.

**NobodyXu** · 07 May 2022, 09:38 PM

Originally posted by BOYSSSSS View Post

I have a feeling bcachefs will be upstreamed and stable long before that ever happens. I've gotten very used to btrfs, but still can't wait to tryout bcachefs.

New filesystem takes decades to develop, so I doubt that it will happen.
But Bcachefs sounds really interesting, I wouldn't mind if it stablises earlier than expected.

**Draget** · 08 May 2022, 04:11 AM

BTRFS is lacking a robust and comprehensive test-suit that also tests pseudorandom corruption / fuzzing-like tests. And any feature not passing this test-suit should print a big fat warning into syslog and other places when being used. Huge respect for Chris Mason and I am really really thankful for such a project existing. Just want to report my experience.

I imagine many motivated (about trying out new filesystems) people got burnt over the last 10 years when some BTFS ENOSPC situation or RAID5/6 setup blew up into their face. Especially for a filesystem that is trying to be a ZFS alternative (about being robust and never loose your data) I have seen way to many broken BTRFS filesystems. Still have some broken btrfs filesystem images on my homeserver.

I love modern filesystems and have been playing with it since over ten years ago. I also had backups… but still lost data more than once in those years which was very frustrating (backups did not run multiple times a day). Since about ~5 years I stopped using BTRFS for anything but occasional experiments. I noticed it has become better (e.g. with ENOSPC situations), but am not yet willing to trust my actual data to it (like on my homeserver / home lab).
Already years ago it was a good filesystem 'when it worked', but miserable when something was off (io errors, crashes, etc.). And I would trade any fancy compression/dedup/snapshot feature for a filesystem that does everything to keep my data intact and also allows me to read the still intact data when there are problems elsewhere.

Once I see some significant progress on the testing or brfs-check side, I will re-evaluate.

**Xake** · 08 May 2022, 07:00 AM

Originally posted by Draget View Post

BTRFS is lacking a robust and comprehensive test-suit that also tests pseudorandom corruption / fuzzing-like tests.

You do know they use (x)fstests which includes this?
Often written (x)fstests because it formerly was named xfstests (and repo still has the name) to test the xfs filesystem, however since so many other filesystems - including btrfs - uses it is has been renamed in docs and code.
The problem is that you cannot make testcases until you know there is a fault to look out for, or identifies a codepath that needs to be tested by fuzzing/pseudorandom corruption.

If you are even semi-regularly on the mailinglist you will notice that many identified problems results in new tests for (x)fstests if they can pinpoint why things failed, and what kind of tests will pick it up, as well as patches that fixes identified problems.
If you know what codepaths that needs to have more tests - pseudorandom, fuzzing or otherwise - I guess the btrfs project would love to hear about it, and possible also with a patch against (x)fstest to reliably execute the test in question.

**waxhead** · 08 May 2022, 08:50 AM

Originally posted by Draget View Post

BTRFS is lacking a robust and comprehensive test-suit that also tests pseudorandom corruption / fuzzing-like tests. And any feature not passing this test-suit should print a big fat warning into syslog and other places when being used. Huge respect for Chris Mason and I am really really thankful for such a project existing. Just want to report my experience.

I imagine many motivated (about trying out new filesystems) people got burnt over the last 10 years when some BTFS ENOSPC situation or RAID5/6 setup blew up into their face. Especially for a filesystem that is trying to be a ZFS alternative (about being robust and never loose your data) I have seen way to many broken BTRFS filesystems. Still have some broken btrfs filesystem images on my homeserver.

I love modern filesystems and have been playing with it since over ten years ago. I also had backups… but still lost data more than once in those years which was very frustrating (backups did not run multiple times a day). Since about ~5 years I stopped using BTRFS for anything but occasional experiments. I noticed it has become better (e.g. with ENOSPC situations), but am not yet willing to trust my actual data to it (like on my homeserver / home lab).
Already years ago it was a good filesystem 'when it worked', but miserable when something was off (io errors, crashes, etc.). And I would trade any fancy compression/dedup/snapshot feature for a filesystem that does everything to keep my data intact and also allows me to read the still intact data when there are problems elsewhere.

Once I see some significant progress on the testing or brfs-check side, I will re-evaluate.

The BTRFS guys do run fuzzing on BTRFS images. There are reports from 2017, 2018 , 2019, 2020 and 2022 on the mailing list. And as Xake pointed out they also run xfstests.

The biggest problem with BTRFS is that people believed it was stable when it was adopted by the kernel. The second biggest problem BTRFS has is nobody reads the warnings about for example BTRFS "RAID"5/6 or check up on status for the features they wanted to use. The third biggest problem was the screwup in kernel ... 5.2 I think - that was swiftly addressed, but apart from that BTRFS from my point of view has been rock solid and I use it on many boxes.

10 years ago it was 2012, BTRFS was not really safe to use until about kernel 4.4/4.14 which was in 2016/2017.

Are you able to either prove or find any horror stories from 2017 that do not...
- Use any multilayered configuration (such as using bcache, lvm or even mdraid as a backing device for BTRFS)?
- Use "RAID"5/6 for metadata?
- Use SINGLE for metadata (or data).?
- Use zoned mode?
- Use quotas?

**ferry** · 08 May 2022, 02:02 PM

Originally posted by waxhead View Post

The BTRFS guys do run fuzzing on BTRFS images. There are reports from 2017, 2018 , 2019, 2020 and 2022 on the mailing list. And as Xake pointed out they also run xfstests.

The biggest problem with BTRFS is that people believed it was stable when it was adopted by the kernel. The second biggest problem BTRFS has is nobody reads the warnings about for example BTRFS "RAID"5/6 or check up on status for the features they wanted to use. The third biggest problem was the screwup in kernel ... 5.2 I think - that was swiftly addressed, but apart from that BTRFS from my point of view has been rock solid and I use it on many boxes.

10 years ago it was 2012, BTRFS was not really safe to use until about kernel 4.4/4.14 which was in 2016/2017.

Are you able to either prove or find any horror stories from 2017 that do not...
- Use any multilayered configuration (such as using bcache, lvm or even mdraid as a backing device for BTRFS)?
- Use "RAID"5/6 for metadata?
- Use SINGLE for metadata (or data).?
- Use zoned mode?
- Use quotas?

No, quite the opposite. We are using BTRFS in RAID10 mode since 2012 and never had a data loss. We are taking snapshots 2x day and back 1 of those up daily. Never needed to restore a file other to fix the occasional accidental delete/overwrite. Other then that we had only minor annoyances ( few files with invalid metadata) that were easily solved with help from the ML.

I still believe hot-relocation is a lot easier to implement (placing hot files on the SSD and just move them to the HDD when needed) then getting a caching file system underneath it.

Announcement

Btrfs RAID 5/6 Sub-Page Support Readied For Linux 5.19

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment