Linux 5.5 SSD RAID 0/1/5/6/10 Benchmarks Of Btrfs / EXT4 / F2FS / XFS

evil_core replied

17 December 2020, 08:47 PM
@profoundWHALEEven IronWolfs (both pro and regulars) 12-16TB got buggy firmware that caused corruption in reed-solomon based raids. Fixed firmware has been released, but remember that they are enterprise-grade drives. You can tell it's some rare 1st revision of firmware. But it's not (I got 7 drvies[mixed 12/14TB Pros and regulars from different batches, and half of them got buggy firmware on sticker). Have you considered that maybe firmware in HDDs or controller were buggy? It's not rare in enterprise and much more possible in consumer solutions. Remember about SATA power managment that caused BTRFS corruption on many boards?
Leave a comment:
VasiliPupkin replied

16 October 2020, 09:52 PM
Something is wrong with these tests. F2FS is a log-structured file system, it always append data to the end and write sequentially. Because there is no random writes, F2FS on RAID5 should ALWAYS outperform single drive setup. This is not the case here, so I suggest to check testing procedure for bugs and external factors, cache, cpu throttling or whatever is spoiling the results. Try to run the test twice and check for reproducibility of the results at first.
Leave a comment:
zyxxel replied

17 March 2020, 05:37 AM
Originally posted by DrYak View Post

Normally scrub should take a couple of hours max, and is something that needs to be performed on a regular basis to guarantee data safety.
(I tend to run it weekly, monthly is about the min recommandations).

This depends on amount of data and how fast the machine can handle reading.

If having 12-14 TB of data on a drive and the machine manages about 150 MB / second then you get a total runtime of around 24 hours. With the larger drives, it's really important to make sure that the drives are connected to controllers that can handle the full transfer speed the disk supports. And even when using ATA-600, most drives are limited to 200-250 MB/second.

So in the end, for really large drives it's often meaningful to split the scrub into multiple runs using cancel/resume instead of doing one huge scrub every x days.
Likes 1
Leave a comment:
zyxxel replied

15 March 2020, 09:39 AM
Originally posted by profoundWHALE View Post

Hey genius, what good is a log of a scrub that fails if the tools for fixing the failure are failing?

I get the saying of "a bad carpenter blames his toolbox" but seriously, what the heck are logs going to do?

The logs informs what is broken.

This is the required input to be able to make good decisions what the next step should be.

Without access to the logs, everything will just end up as black magic.

Originally posted by profoundWHALE View Post

I'm using the same drives, right now. I check the harddrives about twice a year for bad sectors.

You shouldn't check your disks for bad sectors twice/year. Your system should check this continuously and mail you instantly when something starts to smell fishy.

Your story reminds me of way too many people having RAID-5 and forgetting about it. So one year after the first drive fails, a second drive fails and everything is lost. All because of the assumption that all is well as long as data can be read out. And then the user is angry with their RAID-5. Or blames the drives for failing at the same time, even if one drive had been broken for a very long time.

Originally posted by profoundWHALE View Post

You can blame me for blaming btrfs all you want. What decided to just start corrupting my files? btrfs. Was regular scrubbing and defragging in place? Yes. I followed all the instructions that the btrfs developers had for how to do exactly what I was doing.

It is your guess that it was BTRFS that started to corrupt your files. It's just an assumption without being backed by actual proof.

Originally posted by profoundWHALE View Post

So either every single person who did what I did ended up losing their data due to "bad practices" or btrfs has some serious bugs that result in massive corruption.

I have hundreds of "terabyte-years" of data on BTRFS (multiple systems, configurations, drives, controllers, ...) For some strange reasons, it works very well. So BTRFS can't be as bad as you claim. Somehow, you have managed to find a "fuzz factor" that makes your experience different. If that "fuzz factor" is you, your system or your personal actions/assumptions, then it isn't relevant as a general tip for others to stay away from BTRFS.
Leave a comment:
profoundWHALE replied

04 February 2020, 01:52 PM
Originally posted by DrYak View Post

Yup, you have shown disregard to multiple best practices (any logs of scrub, smart, etc. ?), but go on, blame btrfs for your failings.
It's okay to decide BTRFS isn't for you. But blaming it when you're not even checking the logs seem displaced.

Hey genius, what good is a log of a scrub that fails if the tools for fixing the failure are failing?

I get the saying of "a bad carpenter blames his toolbox" but seriously, what the heck are logs going to do?

Originally posted by DrYak View Post

PLEASE DO NOT TRY PLAYING AROUND WITH ZONEFS IT'S EVEN MORE ALIEN YOU WILL COMPLETLY TRASH VERY VALUABLE STUFF WITH IT.

If you're talking about ZFS I've already used it, multiple times. I wanted something with native Linux support. Now ZFS has native ZFS support which is good, but I've already been content with bcachefs in the meantime.

Originally posted by DrYak View Post

When they start dying, they all start dying at more or less the same time.

I'm using the same drives, right now. I check the harddrives about twice a year for bad sectors.

Originally posted by DrYak View Post

You should have tried that WAY MUCH EARLIER. Also, "btrfs restore" can also try to copy file with failed checksum.

I tried that. It failed.

What I had to do was force mount it and force-copy everything and tell it to ignore whatever transfer failed, but also to output whatever failed into a .txt file.

The problem is that I'm dealing with terabytes of videos and if I lose 1% of a video, it's basically garbage now.

You can blame me for blaming btrfs all you want. What decided to just start corrupting my files? btrfs. Was regular scrubbing and defragging in place? Yes. I followed all the instructions that the btrfs developers had for how to do exactly what I was doing.

So either every single person who did what I did ended up losing their data due to "bad practices" or btrfs has some serious bugs that result in massive corruption.

How come "you didn't think too much of it". You have proof of data corruption. On a system that should be able to maintain integrity. What went through your mind.

How about you check your head before trying to jump to conclusions. Settle down. I downloaded some videos a while back for testing some encoders like the daala video codec but not every video reached 100%, meaning some would just be missing chunks. That's why I didn't think too much of it.

After I had videos that I knew were good go bad, then I knew something was wrong and well there's no need for me to repeat the issues and the steps taken to fix the issue.

You've been jumping to conclusions about so many things and getting all worked up. Settle down. I don't know if btrfs is your God and I insulted him or something, but you gotta settle.

Last edited by profoundWHALE; 04 February 2020, 01:57 PM.
Leave a comment:
S.Pam replied

04 February 2020, 09:49 AM
Originally posted by DrYak View Post

A - you are also having "btrfs defrag" running periodically in systemd timer / in a cron job.
This is nuts. This is not part of any "best practice" recommendations. It's not in the default settings of any automatic maintenance tools.
You should NOT periodically run it, it makes no sense.

Why so? Defragmenting helps in many workloads. It can also cause inflation of data usage due to unduping deduped data (snapshots). Still is a safe thing to use.

The "btrfsmaintenance" scripts developed by one of the Btrfs developers does regular scrubs, trims, balance and defragment. https://github.com/kdave/btrfsmaintenance/
Likes 1
Leave a comment:
xinorom replied

04 February 2020, 03:58 AM
Originally posted by profoundWHALE View Post

I'm also not looking for tech support. This was 4 years ago and I've stayed away from btrfs since. Also, if the software In using is so bad that I need tech support just to not result in data corruption then it's bad software.

tl;dr: you're a brainlet and these kinds of self-inflicted problems will happen to you again in the near future. I for one will laugh heartily when it happens. Please keep us updated.
Leave a comment:
profoundWHALE replied

04 February 2020, 02:50 AM
If you're getting confused by terms I'm using such as autodefeag, don't be. This was 4 years ago.

You're asking a lot of questions as if I was running this for some company. I wasn't, it was for me, and I have my own job and my own life to attend to. Automatic scrubbing should be fine. Autodefrag was set because in my testing before actually using btrfs I noticed some issues regarding dropped performance and when I investigated it turned out that there were portions that just got super fragmented.

I had it configured RAID10 software through btrfs' own tools since I had some issues with hardware raid. Besides, I know the filesystem works better when it is aware of what it is doing in regards to things like raid.

Now, to be clear, I've actually been appreciating your responses. They're actually very informative, but I'll continue to ignore the troll.

------

In regards to the corruption, I cannot remember all the details, but btrfs would refuse to mount the drives. It refused to recover from a good drive. It failed scrubbing. It failed (or had no change with) the non-fsck recovery options. As a last resort I tried fsck and it either said it did something (but nothing changed) or it failed as well.

I am still using those same hard drives and I have no issues with them.

This software is supposed to replace something like ZFS and I have never had this type of corruption without hardware failure.

I'm also not looking for tech support. This was 4 years ago and I've stayed away from btrfs since. Also, if the software In using is so bad that I need tech support just to not result in data corruption then it's bad software.
Leave a comment:
xinorom replied

04 February 2020, 01:44 AM
Originally posted by DrYak View Post

Had you been paying attention to logs, you would have been noticing something fishy is happening.
But you didn't, until the point the whole situation has become unbearable.
...
don't blame btrfs for your own admin incompetency

Self-accountability overload. Must. Find. Someone. Else. To. Blame.

I can't wait until it happens again. I wish I could laugh right in his face at the exact moment it all fails. Hopefully he at least posts a "bcachefs is broken" thread on here.
Leave a comment:
DrYak replied

03 February 2020, 10:28 PM
Originally posted by profoundWHALE View Post

Then one day, I try to open a file (such as a video) and notice that it's missing some frame and some audio. I didn't think too much of it.

Okay, wait, what?

How come "you didn't think too much of it". You have proof of data corruption. On a system that should be able to maintain integrity. What went through your mind.
Also you're sure you've been running scrub periodically? Did you even pay attention to the result of the scrub? Did you had any mecanism in place for your server to alert you if something went wrong?

There's no way that under normal use, the first time you notice data is droped frames.
- if you've been running scrubs periodically, the scrub procedure should have returned warning long before you serendipitously discover the corruption.
- the checksuming on btrfs is extent based. If an extent is corrupted, the normal behaviour of btrfs is to declare the whole extent corrupt. You should just have a droped frame, you should have a whole chunk of your video refusing to load. (Note: you can still recover your file using "btrfs restore", but at that point the damage is done)
- it's rather possible that at this point, btrfs is still doing its work: reading the damaged extent, notice checksum failure, try to reconstruct data from the other side of the RAID1. But all this recovery procedure is slow and has a hard time keeping up with the real-time situation of video. Data reaches the player with some delay, causing a few video frame to drop. The drop you experience isn't actually the corruption. It's the latency you experience in real-time video playing while behind the scene btrfs is running circles trying to recover some corrupted mess - recovery is sucessful but doesn't keep up with the real-time video requirement.

Originally posted by profoundWHALE View Post

Continued use of the system and more and more files were showing the same problems, some even saying that the file doesn't actually exist.

This is a telltale sign of hardware corruption happening. The first occurence of same problems of dropped frame is basically latency problems of automatic repairs happening in the background. The "file doesn't exist" is the way some high level software (e.g.: your smbd server) reacting to the unrecoverable checksum errors (No sane copies of the extent are currently accessible).

So have you been paying attention to the logs of the scrub?
Have you been paying attention to the SMART messages of the drive?
Do you even run smartd?

Running smartd is important: running "short tests" nightly at a time of low IO, and running "long tests" periodically (weekly or monthly, at a time of low use, at a DIFFERENT time than the "btrfs scrub" other wise IO will suffer and both procedure will take multiple days to finish due to competing for head seeks) is critical for any serious storage business. Using smartd to monitor a few critical SMART indicators is also a good idea (Backblaze and Google have publication detailing which measurement correlate best with impending doom. Spoiler alert: since the end of the IBM Deathstar era, it isn't temperature anymore).

Once performance started going weird, didn't you think about having a look at journal/dmesg/var-messages to see if there is something obvious that needs adressing ?

You have to realise that:
- if your job relies on data management, you thus have to pay much more attention to the details
- specially if you insist on using new cool toys that work in surprising and unusual way (BTRFS, ZFS and BCacheFS are rather revolutionnary in the way the work. Expect the principle of least astonishment to completely fly out of the window, specially with regards of old habit taken up with EXT4/3/2).

Thus you should at least RTFM of all the tools involved and have a good idea of the idiosyncrasis involved. I'm not only speaking about only btrfs, but any other relevant tool along the way: smartctl/smartd, mdadm (if you use that for RAID5/6), LVM, etc.

If you do not have the time/patience to pick up the above, it's okay to fall back onto ready made for use tool kits.

- opensuse has written very good script to help the maintenance of btrfs (in general experience of new emerging tech like btrfs, systemd, etc. tends to be much smoother in opensuse, because they have tech/engineers putting some effort to make the experience smooth), these tools are even available in Debian.

- it's even okay to rely on an appliance that has abstracted much of this work under the hood into a high level simpler interface, where you can easily get synthetic information ("Your HDD is going to die soon" red box in the interface, instead of digging logs).

Originally posted by profoundWHALE View Post

So I manually run a scrub. When I say it takes a whole day I mean I start it in the morning and by the time I got back from work it should be done but it always failed at about 70%.

I say what I probably think has happened.

(You mention using multiple drives in RAID1+0 configurations. I suspect you followed the age-old mantra of buying them in groups of the same batch (the shop probably even sold you drives with sequential serial numbers). While sysadmins have their reasons to do that - it's easier to manage them in pods - it's necessarily a good tip for a home server. You see, drives from the same batch not only have very close performance to each other (useful for hardware RAID0), they follow the same bathtub curve. When they start dying, they all start dying at more or less the same time. Again this is useful for a datacenter sysadmin (if one drive fails, replace the whole pod, because the others are following up soon). But at home it means that you're going to see drive problem more or less together, making it more complicated to replace them one after the other).

What happened to you, is that if you paid attention to the journal/dmesg of your server, you would have been noticing that for the past month or so , it would be filled with walls of DMA_crc, sector errors and other such messages.
This is most likely caused by hardware problem.
It might have been a problem of the path between the server and the drives (actually very common on SBC such as raspberry pi. Most storage failure aren't failure of the actual mSATA, but power brown outs, bad SATA-to-USB3 bridge chips, etc.)
It might have been the harddrive starting to die of old age, all more or less simultaneously due to the afore mentionned batch effect.

SMART, if you had been paying attention to it, would have been complaining of slight increase of crc errors. smartd would have notified you of the amount of reserved/unallocated sectors being depleted as they need to be allocated to make up for old dead sectors.
SMART test, first during the exhaustive long test, and more recently even the quick test, would have reported read aborts. With logs full of unrecoverable CRC read errors.
In short, your harddrives are starting to fail.

If your server is within earshot, you would even been noticing the typical clicking sounds of the harddrives.

Meanwhile, above that, btrfs has been dutifully trying to it's stuff. Detecting corruption through checksum, then self-repairing corrupted data during scrubs by fetching alternative copies from RAID1 and rewriting them.

This has vaguely kept the system afloat, except for the occasionnal droped frame when latency caused by all this has become too much.

Had you been paying attention to logs, you would have been noticing something fishy is happening.
But you didn't, until the point the whole situation has becom unbearable.

scrub taking ages is a telltale sign.
it probably needs to fight upstream problems. harddrive failing, multiple read attempts, clicking sounds. reading data to control its checksum is getting terribly slow and difficult.
at multiple point along the scrub, recovery needs to happen.
at some point the scrub just gives up, either because the problems cause too much retry and timeouts trying to get the data. Or because both RAID1 copies are on failed sectors of the drives.

This is the point where any sane admin would realise:
- you've been missing something huge for quite a lot of time.
- it's time to check that you have any critical stuff backup somewhere (secondary backup fileserver, optical media, tape, whatever you use)
- it won't be bad to reconsider your general strategy:
RTFM is a possibility.
Swithing to an appliance where somebody else has done the work for you, like a Sinology is another valid strategy.
Droping the high tech toys because you're unable to use them correctly IS a valid strategy. But don't blame btrfs for your own admin incompetency.

Originally posted by profoundWHALE View Post

I saw that there were some more things I could try with scrub by instead of trying to do it in the drives and then come back and try another one, I tried the several different commands on each drive. I found out when I get back home hat they halted due to errors, you know the errors that it's supposed to fix.

At that point you're probably just bashing random command you've been reading at various stackexchange forum.
Usually it pretty quickly devolves into using creatively the experimental options. Like zeroing the checksum tree. And shooting your filesystem in the head using FSCK.

The description you're giving: "some more thing", "tried several different commands", "*they* halted due to errors, you know the errors that *it* is supposed to fix" clearly demonstrates that you don't have a clue what are the stuff you're copy pasting from forum into your command line.

In general the procedure is simple.
You run scrub. If the scrub says your system is sane, you might proceed further.
If the scrub fails, that means that things are already dear and the system is failing. Refreshing your backup at that point is a good idea. If the system doesn't mount or if files aren't accessible, try extracting them manually using "btrfs restore" but pay attention to its logs with regards to crc.
Now it's agood time to investigate WHAT caused the dataloss. Funny surprise might be comming (an impeding mass death of your drives).

If the scrub says your data is clean, then your data is clean for all intent. There might be further problem that can be fixed by some careful rebalancing with corresponding filters. e.g.: "enosp" error can be fixed by rebalancing with usage filters to purge and compact old "swiss cheese" block groups.

But if the scrub fails, it means the current state of the drive is very dire.
- it's not "we have a minor problem, but we're too stupid to have a functionnal fsck to fix that minor problem".
- it's "the kind of problem that are normally fixable by fsck have been getting fixed in the background by the filesystem. Now I can't even manage that anymore. Problems are too major, I'm giving up".

That's a general problem of new filesystem that are a little bit too magic (like BTRFS, ZFS).

They can automatically handle most problems for you and kind of "hide them from you", until it's not possible anymore at which point the whole system is fucked up.

Remember that demo where sun smashed RAID-Z2 ZFS filesystem with a hammer? and the whole thing just ran perfectly despite two drives being down?
What would you expect would have been happening once a third drive got hammered?
Yes, right.
The whole thing critically failing to the point that it's not recoverable anymore.
A RAID-Z2 system with three drives dead just can't function. Mathematically it doesn't contain enough information to go forward.
It's just as dead as your BTRFS system.
You just crossed the point at which the magic stops hiding the multiple problems and (both litteral and metaphorical) hammers to the drives.

Originally posted by profoundWHALE View Post

Eventually I managed to copy the files from one drive to an empty drive and whatever was totally corrupted was skipped automatically.

You should have tried that WAY MUCH EARLIER. Also, "btrfs restore" can also try to copy file with failed checksum.

Originally posted by profoundWHALE View Post

But then I found out that even if it copied, many files were missing chunks from them.

That's the extents with the failed checksum. Alternativement by using btrfs restore, you'd have chunks of noise in the middle.

Originally posted by profoundWHALE View Post

I had to use the list of corrupted files to know what it is that I needed to recover from backup, I checked around and I still had the original SD card for things like wedding videos.

If it is critical, it should go on some ROM media like optical, or on a secondary backup server.
"1 copy is no copy" mantra.

Originally posted by profoundWHALE View Post

So, like I said. For me, the person, I will never be able to trust btrfs.

Yup, you have shown disregard to multiple best practices (any logs of scrub, smart, etc. ?), but go on, blame btrfs for your failings.
It's okay to decide BTRFS isn't for you. But blaming it when you're not even checking the logs seem displaced.

Originally posted by profoundWHALE View Post

I've never had any issues with corruption -yet- on bcachefs. The problems I'm referring to is stuff like a piece of the software isn't working quite right like a certain feature might not be functional yet or a girl update fails to build. The point was when there's a problem with bcachefs Kent is like oh I need to fix that.
When there's a problem with btrfs it's just sort of a "quirk" which you should know about or else you'll lose your data or something fun like that.

I haven't completely understood your girl update fails feature whatever.

Still, BCacheFS share lots of feature with other modern FS (checksuming, redundancy, self-repair) - the only main difference is it's specific tiered approach to storage. That's its shtick that is has inherited from BCache.

That also means that it will be able to cope with quite some mess underneath.
That also means that if there is problems showing despite it self-healing capabilites, when that point is reached the situation is dire.
The only difference is that the tool to exact whatever you can from the mess might be indeed called "fsck" in BCacheFS land.

With regards to the quirks: well what did you expect - btrfs is one of the new gen "quasi magic self-healing FS", and its own of the early one. It is going to be weird.
Specially space allocation was not very smart in early version and very quickly came to bite you.
Thing have improved recently (e.g.: you don't need to be aware of the whole block group allocation inside btrfs anymore. Though you're still dealing with a new-gen beast that is both a filesystem layer AND a volume manager layer, both at the same time. Getting informed about this kind of unusual new tech before using it should be expected).

And after this whole long rant, I might give you a new advice.
You seem to be attracted to shiny new toy that get mentionned in tech news.
You don't spend much time documenting yourself about the intricacies of this new toys.

PLEASE DO NOT TRY PLAYING AROUND WITH ZONEFS IT'S EVEN MORE ALIEN YOU WILL COMPLETLY TRASH VERY VALUABLE STUFF WITH IT.
Likes 2
Leave a comment:

Announcement

Linux 5.5 SSD RAID 0/1/5/6/10 Benchmarks Of Btrfs / EXT4 / F2FS / XFS

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: