Originally posted by Solid State Brain
View Post
Announcement
Collapse
No announcement yet.
Linux 5.18 Looks Like It Will Finally Land Btrfs Encoded I/O
Collapse
X
-
Originally posted by F.Ultra View PostOne of my BTRFS Raid10 is compromised of 24x14TB drives. Full scrub takes about 4h.
So, you're only reading the blocks with filesystem data, rather than mdraid's naive approach of reading all blocks. That's all well and good, until your array starts to near capacity. Then, you're going to approach the same scrub times as with mdraid or HW raid. Perhaps even far worse, if BTRFS doesn't scrub using completely contiguous reads.
Originally posted by F.Ultra View PostAll 100% reads only.
- Likes 1
Comment
-
Originally posted by onlyLinuxLuvUBack View Post
100% where? everywhere or just 8 thread test ?
Originally posted by pal666 View Postthose users will never to heavy random writes. os buffers and coalesces writes, only reads can be random on non-database workload
One could use buffered read/writes, but that's not how synthetic SSD benchmarks are normally done. Buffered operations with Btrfs are of course better but still affected by high CPU overhead.
Here is a Phoronix benchmark with fio from 2020 which curiously saw Btrfs doing quite well with a Gen.4 NVMe SSD, using buffered/non-direct operations: https://www.phoronix.com/scan.php?pa...esystems&num=3
Comment
-
I don't care about RAID5/6 on Btrfs as there are already MDADM and LVM for this as FS-agnostic ways to achieve this. What would be the theoretical advantages (besides needing to use fewer maintenance tools) of having RAID within the FS layer instead of the block layer anyways? Is it currently known if old volumes can easily be upgraded to the new on-disk format with some sort of conversion tool or will a reformat followed by restoring the data from a backup be needed?
Comment
-
Originally posted by coder View PostThat's only because it's not full. No 14 TB HDD can be read in 4h, as that would amount to 972 MB/s, which is far outside the range of any current HDD media transfer rate.
So, you're only reading the blocks with filesystem data, rather than mdraid's naive approach of reading all blocks. That's all well and good, until your array starts to near capacity. Then, you're going to approach the same scrub times as with mdraid or HW raid. Perhaps even far worse, if BTRFS doesn't scrub using completely contiguous reads.
Yes, because that's what scrubbing is.
Still it's way faster than your raid6 (which of course is not only due to this being only reads, having 24 SAS drives also help quite a bit), you have 16TB of disk to go over in over 9h while mine finished scrubbing 60TiB in 4:26:02
Code:root@fileserver-sto5:~# btrfs scrub status /opt UUID: d6cb5d55-729e-4b44-aee0-526b6fb82aed Scrub started: Fri Feb 11 21:54:31 2022 Status: finished Duration: 4:26:02 Total to scrub: 60.48TiB Rate: 3.87GiB/s Error summary: no errors found root@fileserver-sto5:~#
Last edited by F.Ultra; 12 February 2022, 02:03 PM.
Comment
-
Originally posted by kiffmet View PostI don't care about RAID5/6 on Btrfs as there are already MDADM and LVM for this as FS-agnostic ways to achieve this. What would be the theoretical advantages (besides needing to use fewer maintenance tools) of having RAID within the FS layer instead of the block layer anyways? Is it currently known if old volumes can easily be upgraded to the new on-disk format with some sort of conversion tool or will a reformat followed by restoring the data from a backup be needed?
Another practical thing is that if a drive will fail in the future I will not risk having any of the working drives being nuked by mdraid forcing a complete resync when replacing the failed drive (this is a common problem among raid users that replacing a failed drive will make other drives fail during the rebuild phase) since btrfs will only read from the working drives and not write to them.
Another practical benefit is that the btrfs raid keeps checksums of the files and not just a parity of each chunk which means that whenever I want to perform a check or a recovery I only have to touch/read the actual amount of data stored and not the entire storage pool size.
Comment
-
Originally posted by kiffmet View PostWhat would be the theoretical advantages (besides needing to use fewer maintenance tools) of having RAID within the FS layer instead of the block layer anyways?
The main disadvantage of using a FS in this way is that it involves reading the entire stripe from all devices. mdraid has a read optimization I think most hardware RAID controllers also do, which reads each stripe from N-1 or N-2 drives, depending on whether you're using RAID-5 or RAID-6. I'm not aware of an option to force mdraid to always read from all drives and check parity.
I think another disadvantage with mdraid is that a disk gets completely ejected from the array, when it has any errors. Now, let's say you have an 8-drive RAID-6 and one drive fails. You pull it and rebuild with a new drive. However, during the rebuild, errors are encountered in two of the other drives, each at different spots. What I think will happen is that the first error will eject that drive, and now you've lost all redundancy. Upon encountering the second error, you're faced with an array failure. Even though all the data could be reconstructed (i.e. by reading blocks from whichever drives don't have errors in them), I think mdraid won't do it.
Sadly, these functional gaps in mdraid aren't fundamental. It can do both things (i.e. demand-scrubbing - checking parity on all reads) and flexible array rebuilds. I guess nobody cared enough to add those features.
Somebody please correct me, if I'm mistaken.
Comment
-
Originally posted by F.Ultra View Posttrying to recover from a failed drive in a normal raid might nuke your drives while btrfs will not since the recovery will only be reads from the old drives and all the writes will only happen on the new drive while in a normal raid the entire raid have to be rewritten.
Comment
-
Originally posted by F.Ultra View Postevery single time where there is a non clean shutdown (be it due to power outage or complete system hang [I'm a dev so that happens]) I no longer have to wait for hours for the machine to boot and be usable.
Originally posted by F.Ultra View PostAlso I don't miss having my machine completely bork out on my the last sunday in every month when mdraid decides to do a complete resync of the entire raid stack.
Originally posted by F.Ultra View PostAnother practical thing is that if a drive will fail in the future I will not risk having any of the working drives being nuked by mdraid forcing a complete resync when replacing the failed drive (this is a common problem among raid users that replacing a failed drive will make other drives fail during the rebuild phase)
Comment
Comment