Announcement

**fitzie** · 12 May 2023, 06:56 PM

phoronix got called out in the patch submission discussion.

File systems tend to violate all of these precepts: (a) people chase
benchmark optimizations to the exclusion of all else, because people
have an unhealthy obsession with Phornix benchmark articles, (b) file
systems tend to be inherently multi-threaded, with lots of locks, and
(c) file systems are all about managing global state in the form of
files, directories, etc.

from https://lore.kernel.org/lkml/[email protected]/

**oiaohm** · 12 May 2023, 07:10 PM

Originally posted by Chugworth View Post

Here is a link that explains the methodology of the ZFS RAID-6 implementation:

Double-Parity RAID-Z – Adam Leventhal's blog

http://dtrace.org/blogs/ahl/2006/06/18/double-parity-raid-z/

The chances of having data loss here are extremely low. But even still, that's why we make backups with zfs send/receive.

ZFS pool faulted after single disk failure

https://forums.freebsd.org/threads/zfs-pool-faulted-after-single-disk-failure.26459/

A few days ago I decided to convert my home server from Ubuntu Linux to FreeBSD 8.2 so that I could use ZFS, for data integrity. I am aware that ZFS is available on Linux, but from what I've read the FreeBSD implementation is better. I installed FreeBSD onto a mirrored ZFS pool, and created a...

This was spotted in 2011 raidz2 at times the party data is not unique this results in loss 1 disc drop to degraded like Raidz1 and completely failed after the lose of 2 like Raidz1. So Raidz2 only sometimes supports 2 discs of failure not always even that the 2006 documentation says it always supports 2 discs of failure.

Yes documentation on how something should be implemented and what is implemented unfortunately is two different things. Raid 6 is very simple to get wrong with algorithmic error with Raid6 implementation being wrong in different cases results in Raid6 being only minorally better than Raid5 when calculating failure tolerance and even ZFS Raidz2 that is ZFS Raid6 implementation shows this problem of implementation errors.

The chances of data loss is not as low as what you think it is. Raid1 solutions are a simpler implementation harder to screw up but have a price of needing more storage.

This starts explaining why there is not major effort to fix btrfs raid5 and raid6. Horrible reality is no software implementation of Raid6 is correct. No modern hardware raid is Raid6 correct of course. Some historic PCI (yes the slot before PCIe) hardware raids do have raid6 implemented correctly so it is possible to do it right.

Originally posted by Chugworth View Post

I did use it for many years though on many servers.

When is important. Hardware raid use to be good with proper protections against all kinds of issues. Modern hardware raid controllers are basically garbage.because they have been cost cut and optimized for performance to the point the important features are gone. If the raid card goes into a PCIe slot or newer its wrong.

Raid cards were competing against each other for what could do the faster transfers the result has been cheating(skipping out of processing checksum) and algorithmic errors. The unfortunate part is some of those algorithmic errors have got into software raid implementations including the ZFS ones.

Think about it safe Raid1 is simple just make another copy and process checksums no fancy algorithmic thing to get right..

What makes partiy of Raid6 tricky is that if p1 and p2 happen to be done between the same two blocks you lose those 2 blocks you are just as stuffed with no means to rebuild.. This is a common Raid 6 implementation error.

Chugworth most of your support of raid is based on it "worked for me." there are a lot of parties who have had to pay for data recovery because it has not worked for them.

Another thing the old raid controllers you see beasts as 51 and 61 raid options. The old raid controller were willing to give up performance for data integrity.

**Mitch** · 13 May 2023, 12:03 AM

Originally posted by ryao View Post

The release tagged to support Linux 6.2 had experimental Linux 6.3 support before Linux 6.3 was out. The reason we did not advertise it was because people wanted to gain confidence that we did not miss anything. I imagine officially advertised 6.3 support will be in a tagged release soon alongside experimental 6.4 support in it. It has been like this for a while. Most of the time, we do not find that we need to do anything else beyond what we did initially and we are getting better at catching things that slip past us when making initial support.

For example, not that long ago, we found that support for an optional VFS feature that Linux 3.10 did not have had broken due to a kernel API change. We had missed that in the first pass because the autotools check for it assumed turning off support was okay when the code could not build with it turned on, our regression tests supported running on systems that did not support it and the result was very easy to miss. Now the autotools check will only turn off support on Linux 3.10 or older, such that it will break the build if support for that feature is missing from a more recent kernel. This will let us know about such breakage extremely early, such that our patches to support newer kernels will not allow it to regress again.

I appreciate all the insight and information. Is this all in reference to ZFS, BCacheFS, or just the Kernel itself? I'm not bashing ZFS either, it's just that when I game, especially with a new GPU, I often have new stuff every kernel, so I try to keep things current. If we were talking about a server or casual PC, I probably wouldn't mind letting things all catch up together before upgrading.

**oiaohm** · 13 May 2023, 10:07 AM

Originally posted by Mitch View Post

I appreciate all the insight and information. Is this all in reference to ZFS, BCacheFS, or just the Kernel itself? I'm not bashing ZFS either, it's just that when I game, especially with a new GPU, I often have new stuff every kernel, so I try to keep things current. If we were talking about a server or casual PC, I probably wouldn't mind letting things all catch up together before upgrading.

ryao is ZFS person. There is on going issues being out of tree due to how much in the Linux kernel changes. The Microsoft developers at recent conference are talking about altering Linux kernel memory permissions again that normally brings a new way of out of tree kernel modules don't work.

It looks another round of bcachefs patches will be required before mainline. There are some show stopper faults to being mainline found in the current set of patched. Peer review of going mainline normally does weed out some mistakes.

**Chugworth** · 13 May 2023, 01:53 PM

Originally posted by oiaohm View Post

ZFS pool faulted after single disk failure

https://forums.freebsd.org/threads/zfs-pool-faulted-after-single-disk-failure.26459/

A few days ago I decided to convert my home server from Ubuntu Linux to FreeBSD 8.2 so that I could use ZFS, for data integrity. I am aware that ZFS is available on Linux, but from what I've read the FreeBSD implementation is better. I installed FreeBSD onto a mirrored ZFS pool, and created a...

This was spotted in 2011 raidz2 at times the party data is not unique this results in loss 1 disc drop to degraded like Raidz1 and completely failed after the lose of 2 like Raidz1. So Raidz2 only sometimes supports 2 discs of failure not always even that the 2006 documentation says it always supports 2 discs of failure.

Yes documentation on how something should be implemented and what is implemented unfortunately is two different things. Raid 6 is very simple to get wrong with algorithmic error with Raid6 implementation being wrong in different cases results in Raid6 being only minorally better than Raid5 when calculating failure tolerance and even ZFS Raidz2 that is ZFS Raid6 implementation shows this problem of implementation errors.

The chances of data loss is not as low as what you think it is. Raid1 solutions are a simpler implementation harder to screw up but have a price of needing more storage.

This starts explaining why there is not major effort to fix btrfs raid5 and raid6. Horrible reality is no software implementation of Raid6 is correct. No modern hardware raid is Raid6 correct of course. Some historic PCI (yes the slot before PCIe) hardware raids do have raid6 implemented correctly so it is possible to do it right.

When is important. Hardware raid use to be good with proper protections against all kinds of issues. Modern hardware raid controllers are basically garbage.because they have been cost cut and optimized for performance to the point the important features are gone. If the raid card goes into a PCIe slot or newer its wrong.

Raid cards were competing against each other for what could do the faster transfers the result has been cheating(skipping out of processing checksum) and algorithmic errors. The unfortunate part is some of those algorithmic errors have got into software raid implementations including the ZFS ones.

Think about it safe Raid1 is simple just make another copy and process checksums no fancy algorithmic thing to get right..

What makes partiy of Raid6 tricky is that if p1 and p2 happen to be done between the same two blocks you lose those 2 blocks you are just as stuffed with no means to rebuild.. This is a common Raid 6 implementation error.

Chugworth most of your support of raid is based on it "worked for me." there are a lot of parties who have had to pay for data recovery because it has not worked for them.

Another thing the old raid controllers you see beasts as 51 and 61 raid options. The old raid controller were willing to give up performance for data integrity.

So no RAID controllers for PCIe slots have ever been made right? Do you have any idea how many PCIe RAID systems have been sold? If that was true, it would be widely known to avoid them. You're talking about some strange edge cases that few people ever encounter. Do you think that in any of the RAID-6 testing, no one ever yanks out two random drives to see if the pool continues working? If what you're saying is true, there should be frequent failures when you do that. And don't forget, with ZFS it's recommended to do monthly scrubs to proactively look for data errors.

I agree that a pool of three mirrored drives would be a bit safer than a RAID-6 pool. But I wouldn't agree with a pool of two mirrored drives since you still have the risk of one failing during a rebuild. And that becomes an issue when you want to create a large amount of storage with SSD drives due to cost and availability.

When dealing with SSD drives I sometimes opt for RAID-5 instead of RAID-6 just to get the extra space. I trust an SSD drive more than a hard drive. And most importantly of all, I know that I have a backup.

**oiaohm** · 13 May 2023, 03:23 PM

Originally posted by Chugworth View Post

So no RAID controllers for PCIe slots have ever been made right? Do you have any idea how many PCIe RAID systems have been sold? If that was true, it would be widely known to avoid them.

YouTube

https://www.youtube.com/embed/l55GfAwa8RI

Yes it coming widely known to avoid hardware RAID Controllers. But the downgrade started with PCIe ones.

Originally posted by Chugworth View Post

You're talking about some strange edge cases that few people ever encounter.

Problem is not a rare as one would like.

Originally posted by Chugworth View Post

Do you think that in any of the RAID-6 testing, no one ever yanks out two random drives to see if the pool continues working? If what you're saying is true, there should be frequent failures when you do that.

Stupid test that does not prove that the parity is right creates a tricked "works for me" problem so you think it fine when its not fine. Lets say the mistake is in not used areas of the RAID fault does not show itself. You have not used the area of the raid with the incorrect parity alignment yet. Or worse the drive is dynamically changing parity alignments and you have not race conditioned yet.

Basically we have hardware raid controller basically doing the same kind of things as those fake SSD claiming to be like 1tb drives and the like and they are not. Everything looks fine until its not.

Data recovery firms make a lot of money repairing raid 5 and raid 6 single drive failures. Yes Raid 6 it should be 3 discs but since the PCIe RAID controllers cases of single disc failure has been turning up. Yes Raid 5 should be 2 discs failure before needing specialist repair but single disc failure has been turning up as well.

Yes Raid 6 hardware gone down due to a single drive failure due to bad logic design you cannot sue the raid card makers. All PCIe Raid cards came with the disclaimer they don't protect your data. As long as they can remake a RAID array they are classed as functional. PCIe raid cards are not data protection they started that in their manuals so eating your data is allowed and they will do it.

Last cards not to have a disclaimer protecting from defective raid design are PCI ones.

Originally posted by Chugworth View Post

But I wouldn't agree with a pool of two mirrored drives since you still have the risk of one failing during a rebuild.

Raid 5 rebuild same problem of drive failing. Both cases it send drives to data recovery firm to bring them back from dead or restore from back ups..

Its shocking how long hardware raid controllers have been made wrong and companies making them have just got away with it.

There is a test in that video that gives a better idea if a Raid controller is right. That is inject incorrect data see if the Raid controller detects and removes the incorrect data. This does not promise that parity mistake does not exist. but as least you are dealing with better logic designed raid solution.

Its fairly simple to validate a RAID 1 logic. Raid 5 and Raid 6 not that simple.

**Chugworth** · 13 May 2023, 08:13 PM

Originally posted by oiaohm View Post

YouTube

https://www.youtube.com/embed/l55GfAwa8RI

Yes it coming widely known to avoid hardware RAID Controllers. But the downgrade started with PCIe ones.

Problem is not a rare as one would like.

Stupid test that does not prove that the parity is right creates a tricked "works for me" problem so you think it fine when its not fine. Lets say the mistake is in not used areas of the RAID fault does not show itself. You have not used the area of the raid with the incorrect parity alignment yet. Or worse the drive is dynamically changing parity alignments and you have not race conditioned yet.

Basically we have hardware raid controller basically doing the same kind of things as those fake SSD claiming to be like 1tb drives and the like and they are not. Everything looks fine until its not.

Data recovery firms make a lot of money repairing raid 5 and raid 6 single drive failures. Yes Raid 6 it should be 3 discs but since the PCIe RAID controllers cases of single disc failure has been turning up. Yes Raid 5 should be 2 discs failure before needing specialist repair but single disc failure has been turning up as well.

Yes Raid 6 hardware gone down due to a single drive failure due to bad logic design you cannot sue the raid card makers. All PCIe Raid cards came with the disclaimer they don't protect your data. As long as they can remake a RAID array they are classed as functional. PCIe raid cards are not data protection they started that in their manuals so eating your data is allowed and they will do it.

Last cards not to have a disclaimer protecting from defective raid design are PCI ones.

Raid 5 rebuild same problem of drive failing. Both cases it send drives to data recovery firm to bring them back from dead or restore from back ups..

Its shocking how long hardware raid controllers have been made wrong and companies making them have just got away with it.

There is a test in that video that gives a better idea if a Raid controller is right. That is inject incorrect data see if the Raid controller detects and removes the incorrect data. This does not promise that parity mistake does not exist. but as least you are dealing with better logic designed raid solution.

Its fairly simple to validate a RAID 1 logic. Raid 5 and Raid 6 not that simple.

Take that video and jump to position 14:43:

https://youtu.be/l55GfAwa8RI?t=883

The keyword for that video is the very first word: "Hardware". And I did say a few posts back that I am no fan of hardware RAID. I agree though that a controller card that does no error checking is worthless. The thing about hardware is that if a feature is not actively accessed and used by the end users, you can't trust the manufacturer to implement it correctly.

Level1Techs have been big proponents of ZFS though, and they have several videos about ZFS storage devices they have built. Truth be told, ZFS is probably about the only implementation of RAID that I would trust these days.

**oiaohm** · 14 May 2023, 05:41 AM

Originally posted by Chugworth View Post

The keyword for that video is the very first word: "Hardware". And I did say a few posts back that I am no fan of hardware RAID. I agree though that a controller card that does no error checking is worthless. The thing about hardware is that if a feature is not actively accessed and used by the end users, you can't trust the manufacturer to implement it correctly.

Also software implementations need to be taken carefully as well. Remember that video also points out that mdraid Linux kernel software raid is has the mistake of data does not match no drive error the parity must be wrong lets just update the parity data tell the user nothing yes Window built in software raid has this mistake as well.

This is the problem hardware and software raids many of them you cannot trust them at all.

Originally posted by Chugworth View Post

Level1Techs have been big proponents of ZFS though, and they have several videos about ZFS storage devices they have built. Truth be told, ZFS is probably about the only implementation of RAID that I would trust these days.

ZFS pool faulted after single disk failure

https://forums.freebsd.org/threads/zfs-pool-faulted-after-single-disk-failure.26459/

A few days ago I decided to convert my home server from Ubuntu Linux to FreeBSD 8.2 so that I could use ZFS, for data integrity. I am aware that ZFS is available on Linux, but from what I've read the FreeBSD implementation is better. I installed FreeBSD onto a mirrored ZFS pool, and created a...

Do remember this link this is 2011 but if you look around there have been others. ZFS raidz2 raid6 equal may not be stable as first believed. Raid1 forms equal in btrfs and Zfs don't have strange issues turning up.

Remember modern hardware raids being garbage took over a decade for people to start admitting to themselves that modern hardware raids were broken past all possibility of being usable.

Chugworth the problem is the "works for me factor". Lot of people can be used broken design raid solutions and get away with it and from getting away with it ignore the fact there are reports that the solution they are using is broken.

Raid5 and Raid6 are harder to validate and can be very foolish in parity selection(this is what zfs appears to suffer from) or being very foolish in detection of parity miss match(replace parity without raising alarm).

There is a problem that happened here. Most modern raid designs be them hardware or software are optimized for performance at the cost of data integrity this leads to rebuild failures when Raid theory says you should not have failures.

Then for items like ZFS, btrfs... there is not a really good system to look at the source code extract the raid method and check it for logical sanity. This reality means the saying RAID is not a Backup has to be absolutely obeyed. Anything raid 6 is really simple to have logical insanity with parity selection result in not extra drive integrity..

Chugworth with what I know I would not be recommend generic raid 5/6. Raidz1 and Raidz2 with some warning that it may not be exactly right due to the issues reported so backups will be absolutely critical in case the raidz1/raidz2 does not rebuild even after a single disc failure. Use something file system raid 1 style from bcachefs or btrfs are quite decent and should get your drive failure count.

I do wish we had better tools for working out how good raid implementations are and what in the implementation is busted.

**nintendo1889** · 03 June 2023, 04:09 PM

I don't know why JFS never caught on. I use it on my home bitcoind server.

Announcement

Bcachefs Submitted For Review - Next-Gen CoW File-System Aims For Mainline

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment