Announcement

**skierpage** · 11 May 2023, 01:20 PM

Originally posted by evert_mouw View Post

Chech out REDOX... encryption, CoW

6 years ago: https://github.com/redox-os/tfs
currently: https://github.com/redox-os/redoxfs

tfs was a passion project that had some great ideas but is abandoned.

**Chugworth** · 11 May 2023, 01:55 PM

Originally posted by oiaohm View Post

ZFS does not in fact implement raid5 or 6. RAIDZ1 and RAIDZ2 are ZFS closet to RAID 5 and 6 but there are many things RAIDZ1 and RAIDZ2 does that is not RAID5 or RAID6 specification this is why those avoid particular problems.

It achieves the goals of RAID 5 and 6 in a method that works better: It combines the space of a pool of disks and remains functional without data loss if one drive or two drives fail.

Originally posted by oiaohm View Post

There is a growing problem for anything based around RAID5 and RAID6 ideas as drives grow in size the rebuild time is going up. Simpler duplicate the file solutions have reduced rebuild times.

I don't really trust the idea of a system that tries to keep redundancy by copying files in the background. Especially if you're dealing with extremely large files like virtual machine images that can be over 100GB. I'd hope that it catches all the changes, and is not just comparing modification times. Rebuild times are no issue for me because it's rare that it needs to be done, and on RAID6 you're still allowing for one more drive failure during the rebuild.

**ryao** · 11 May 2023, 08:26 PM

Originally posted by Mitch View Post

Fair point. But the two aren't mutually exclusive. We'll have to see what "bugs" means. A hypothetical bug where the performance drops 10% when you have more than 12 CPUs isn't all that scary. Data corruption bugs are the worst news. This is a chicken and egg problem. Until we see more adoption, we may not see all the issues clearly. I know they've done a lot of testing.

I'm more than okay using BCacheFS for my Steam library, for example, which I wouldn't mind losing in the worst of cases. Seems like the ideal FS if you have a solid state disk backed by a huge hard disk, and you want the OS to dynamically manage your hot vs cold data. I've been using ZFS in this way to great success, but the fact that it's out of tree, and it's sometimes late in supporting the latest kernel can be a pain.

The release tagged to support Linux 6.2 had experimental Linux 6.3 support before Linux 6.3 was out. The reason we did not advertise it was because people wanted to gain confidence that we did not miss anything. I imagine officially advertised 6.3 support will be in a tagged release soon alongside experimental 6.4 support in it. It has been like this for a while. Most of the time, we do not find that we need to do anything else beyond what we did initially and we are getting better at catching things that slip past us when making initial support.

For example, not that long ago, we found that support for an optional VFS feature that Linux 3.10 did not have had broken due to a kernel API change. We had missed that in the first pass because the autotools check for it assumed turning off support was okay when the code could not build with it turned on, our regression tests supported running on systems that did not support it and the result was very easy to miss. Now the autotools check will only turn off support on Linux 3.10 or older, such that it will break the build if support for that feature is missing from a more recent kernel. This will let us know about such breakage extremely early, such that our patches to support newer kernels will not allow it to regress again.

**oiaohm** · 11 May 2023, 10:45 PM

Originally posted by Chugworth View Post

It achieves the goals of RAID 5 and 6 in a method that works better: It combines the space of a pool of disks and remains functional without data loss if one drive or two drives fail.

https://www.phoronix.com/forums/foru...29#post1386629
Modern hardware Raid 5/6 the ZFS RaidZ1 and RaidZ2 absolutely exceeds it. Linux kernel software raid 5/6 has the same issues.

Now you have rebuild problem.

Originally posted by Chugworth View Post

I don't really trust the idea of a system that tries to keep redundancy by copying files in the background. Especially if you're dealing with extremely large files like virtual machine images that can be over 100GB. I'd hope that it catches all the changes, and is not just comparing modification times. Rebuild times are no issue for me because it's rare that it needs to be done, and on RAID6 you're still allowing for one more drive failure during the rebuild.

File systems like btrfs ZFS... allows mirroring of files between drives. Now a file mirrored to 2 drives 1 drive can fail without it being lost. File mirrored to 3 drives 2 drives can fail without it being lost.

Btrfs RAID profiles | Forza's Ramblings

https://wiki.tnonline.net/w/Btrfs/Profiles

Btrfs supports many different profiles, generally called RAID modes. It is possible to convert between profiles on a mounted filesystem well as mixing devices of different sizes.

RAID 1 as horrible as it sounds basic mirroring has the same amount of drive failures before data loss.
RAID1c3 same amount of failures as Raid 6. then you have RADI1c4

Lot less rebuild processing on RAID1 or RAID1c3 btrfs is fully checksumed.

The parity of Raid5 and raid6 has it downsides the upside of party is needing less disk space. The downside more reads to rebuild and more reads more risk that rebuild will fail. More processing with party equals more processing load..

RAID1c3 in fact can have all drives party failed and in fact successfully rebuild. Think about it if all the lost sectors don't line up. rebuild is possible.

The problem you have in Raid5/6 is think how party works.

block A + party X generates block B
Block B + party X generates block A

What happens if all you have is party X and block B and block A are both damaged that right data loss. Raid66 you are crossing fingers that the party data is not just a straight duplication of the Raid5 parity data some implementations all Raid6 happens to be is they duplicated the Raid5 parity data..

Raid1c3 block A1 block A2 and block A3 the number being the copy is data lost. Most of the time no. Why you have checksum and comparing the 3 blocks you can normally filter out the damage.

This is the problem Raid5 and Raid6 is not your highest data protection Raid1c3/Raid1c4....are your higher protections but that have a storage space cost.

The idea that Raid6 allows another drive to fail is not always true case of Raid 6 having the same blocks pair of blocks in both of the party linked to each other results in 2 drives being Raid 6 being dead like Raid5. So raid6 implementation details is critical not all raid 6 implementations are better than raid 5 when it comes to drive failures.

R

**billyswong** · 12 May 2023, 06:54 AM

Whatever problem RAID5/6 had in conventional sense, bcachefs is planning to fix that by CoW. Together with atomic update of file checksum and metadata, a CoW filesystem shouldn't face "write hole" problem anymore when implementing RAID5/6-like feature. I think file and directory metadata etc don't need to "save space" and store in RAID5/6 style. Bcachefs can store them in RAID1c2/c3 manner and only file content in RAID5/6 style.

**Chugworth** · 12 May 2023, 07:45 AM

Originally posted by oiaohm View Post

https://www.phoronix.com/forums/foru...29#post1386629
Modern hardware Raid 5/6 the ZFS RaidZ1 and RaidZ2 absolutely exceeds it. Linux kernel software raid 5/6 has the same issues.

Now you have rebuild problem.

File systems like btrfs ZFS... allows mirroring of files between drives. Now a file mirrored to 2 drives 1 drive can fail without it being lost. File mirrored to 3 drives 2 drives can fail without it being lost.

Btrfs RAID profiles | Forza's Ramblings

https://wiki.tnonline.net/w/Btrfs/Profiles

Btrfs supports many different profiles, generally called RAID modes. It is possible to convert between profiles on a mounted filesystem well as mixing devices of different sizes.

RAID 1 as horrible as it sounds basic mirroring has the same amount of drive failures before data loss.
RAID1c3 same amount of failures as Raid 6. then you have RADI1c4

Lot less rebuild processing on RAID1 or RAID1c3 btrfs is fully checksumed.

The parity of Raid5 and raid6 has it downsides the upside of party is needing less disk space. The downside more reads to rebuild and more reads more risk that rebuild will fail. More processing with party equals more processing load..

RAID1c3 in fact can have all drives party failed and in fact successfully rebuild. Think about it if all the lost sectors don't line up. rebuild is possible.

The problem you have in Raid5/6 is think how party works.

block A + party X generates block B
Block B + party X generates block A

What happens if all you have is party X and block B and block A are both damaged that right data loss. Raid66 you are crossing fingers that the party data is not just a straight duplication of the Raid5 parity data some implementations all Raid6 happens to be is they duplicated the Raid5 parity data..

Raid1c3 block A1 block A2 and block A3 the number being the copy is data lost. Most of the time no. Why you have checksum and comparing the 3 blocks you can normally filter out the damage.

This is the problem Raid5 and Raid6 is not your highest data protection Raid1c3/Raid1c4....are your higher protections but that have a storage space cost.

The idea that Raid6 allows another drive to fail is not always true case of Raid 6 having the same blocks pair of blocks in both of the party linked to each other results in 2 drives being Raid 6 being dead like Raid5. So raid6 implementation details is critical not all raid 6 implementations are better than raid 5 when it comes to drive failures.

R

Well just to be clear, I am no fan of hardware RAID. I did use it for many years though on many servers. Early on, the strategy was RAID-5 with one hot spare, and in later years the strategy was RAID-6. I will say though, I never had an instance where RAID let me down. Even on the earlier RAID-5 stratgy, I never had an instance where a second drive failed on the rebuild.

These days I'm all about ZFS, with drives connected to a simple HBA controller. In fact I have done the firmware hack on a couple of servers to convert a Dell PERC to a simple HBA controller.

Here is a link that explains the methodology of the ZFS RAID-6 implementation:

Double-Parity RAID-Z – Adam Leventhal's blog

http://dtrace.org/blogs/ahl/2006/06/18/double-parity-raid-z/

The chances of having data loss here are extremely low. But even still, that's why we make backups with zfs send/receive.

**uid313** · 12 May 2023, 07:59 AM

Originally posted by zexelon View Post

Not an expert opinion here so take it with a grain of salt, but I don't think changing the language would have any affect on the "major bugs" that a file system could run into. Yes Rust is a "safer language" from a memory management side of things perhaps that c/c++ but that has no affect on logic bugs.

The write hole issue mentioned usually as soon as Btrfs is mentioned is a logic issue. You could use any language you wanted and this issue would still exist.

That said, it would be very interesting to see a FS built in Rust, even just to see what could be better/worse, you never know until you try!

Probably the simplest thing to start with would be a re-write of say the FAT32 file system in Rust and see what the benefits could be on a smaller scale first.

Rust doesn't only bring memory safety over C and C++ it also uses Option<T> to get rid of null values so it has safer null handling. It also has safer handling of concurrency which avoids race conditions. With all the safety improvements and it being easier to write concurrency code then the developers have more time to focus on logic bugs.

**dfyt** · 12 May 2023, 08:08 AM

Originally posted by woddy View Post

I'd wait to say it's super stable...just the fact that nobody uses it doesn't make it very stable, but when lots of users try it that's where you'll find out if it's that stable. This happens for any software and especially for file systems.

Agreed, however I test filesystems quite often and use them on my DD's. - migrating my os over each time. My most recent tests with BTRFS over the last few weeks ended as they normally do, with data loss of some sort. With bcachefs when I last tested about 3 months back no issues at all stability wise. Just the compression was single threaded so it was slow when enabled. I live in a country with regular power cuts so it's "easy" to test filesystems for reliability. At this point only 2 filesystems have given me grief in my environment - btrfs and f2fs. If they can multithread the compression workloads bcachefs will be a game changer.

I'm not a fan of btrfs's snapshot implementation though. I find it very limiting and cumbersome to use and prefer ZFS's approach esp when trying to copy a whole filesystem. I must test again with bcachefs.

**woddy** · 12 May 2023, 10:46 AM

Originally posted by dfyt View Post

Agreed, however I test filesystems quite often and use them on my DD's. - migrating my os over each time. My most recent tests with BTRFS over the last few weeks ended as they normally do, with data loss of some sort. With bcachefs when I last tested about 3 months back no issues at all stability wise. Just the compression was single threaded so it was slow when enabled. I live in a country with regular power cuts so it's "easy" to test filesystems for reliability. At this point only 2 filesystems have given me grief in my environment - btrfs and f2fs. If they can multithread the compression workloads bcachefs will be a game changer.

I'm not a fan of btrfs's snapshot implementation though. I find it very limiting and cumbersome to use and prefer ZFS's approach esp when trying to copy a whole filesystem. I must test again with bcachefs.

Sorry if I insist, your tests have zero value, each of us when using a software can have positive or negative experiences, many bugs concern specific hardware or specific configurations.
For example, on my multimedia workstation with a one tera SSD full of data, I have been using Btrfs for over 5 years and I have never had any data loss, in the company where I work we use SUSE on some workstations and the database is on the btrfs file system never had any data loss.
This doesn't make me say that Btrfs is perfect, but I can say that in my experience of several years so far it has been perfect.
I repeat the experiences may vary depending on various configurations, but this bcachefs at the moment has not yet been merged into the kernel, because it is not considered ready to use, this is objective and not subjective, after which you use it and you have never had any problems, but the litmus test will be when many users use it.

**S.Pam** · 12 May 2023, 11:46 AM

Originally posted by dfyt View Post

Agreed, however I test filesystems quite often and use them on my DD's. - migrating my os over each time. My most recent tests with BTRFS over the last few weeks ended as they normally do, with data loss of some sort. With bcachefs when I last tested about 3 months back no issues at all stability wise. Just the compression was single threaded so it was slow when enabled. I live in a country with regular power cuts so it's "easy" to test filesystems for reliability. At this point only 2 filesystems have given me grief in my environment - btrfs and f2fs. If they can multithread the compression workloads bcachefs will be a game changer.

I'm not a fan of btrfs's snapshot implementation though. I find it very limiting and cumbersome to use and prefer ZFS's approach esp when trying to copy a whole filesystem. I must test again with bcachefs.

I'm curious what about the Btrfs snapshot implementation that you don't like? I use it extensively and find it truly a game changer how I manage my systems.

Regarding your data loss. I'm sad to hear that. Can you give some more details on how it happened and how your filesystem was set up at the time?

Announcement

Bcachefs Submitted For Review - Next-Gen CoW File-System Aims For Mainline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment