Announcement

**Space Heater** · 11 March 2021, 05:02 PM

Originally posted by gilboa View Post

Ironically just above this clusters semi identical cluster that used the on-board hardware RAID controller + XFS as opposed to OpenZFS pools, and it simply boot like nothing happened. (We checked checksums and found no errors).

How did you verify the integrity of the data if you were using XFS and a RAID controller? As far as I'm aware XFS does not checksum data (only metadata).

**flower** · 11 March 2021, 05:26 PM

Originally posted by Space Heater View Post

How did you verify the integrity of the data if you were using XFS and a RAID controller? As far as I'm aware XFS does not checksum data (only metadata).

you can use dm-integrity on any device and make the array on top of it. with a scrub its also self healing

**Space Heater** · 11 March 2021, 05:48 PM

Originally posted by flower View Post

you can use dm-integrity on any device and make the array on top of it. with a scrub its also self healing

Who is using that in production for high performance servers? It seems to be primarily targeted for LUKS devices. The documentation even says

dm-integrity has been around a while - version 1.0 was released with kernel 4.12 - but it is written and maintained by the crypto people as part of LUKS, so has a bunch of issues when used with other features such as raid. The good news is that this is simply because it hasn't been used and tested outside of LUKS, and any issues found will be treated as bugs and fixed.

How would this work with hardware raid that presents the OS with a single logical drive? The kernel option BLK_DEV_INTEGRITY requires that MD is enabled, so this seems to require multiple devices being managed by the OS, and not from hardware managing it behind the scenes.

**flower** · 11 March 2021, 06:14 PM

Originally posted by Space Heater View Post

Who is using that in production for high performance servers? It seems to be primarily targeted for LUKS devices. The documentation even says

How would this work with hardware raid that presents the OS with a single logical drive? The kernel option BLK_DEV_INTEGRITY requires that MD is enabled, so this seems to require multiple devices being managed by the OS, and not from hardware managing it behind the scenes.

it won't. it's only for sw raid. i never saw it in production - but i have used it at home one smaller scale with great success. i am atm in the process switching everything to zfs though (mainly because of zfs special vdev - i love them)

i am not aware of any hw raid which has self healing from bitrot.

btw any hw controller without a battery is a joke anyway (this is not a directed to you, but to the person who loves his raid controller)

**Space Heater** · 11 March 2021, 06:37 PM

Originally posted by flower View Post

it won't. it's only for sw raid.

Ok, then I still believe it's very unlikely that gilboa used dm-integrity to verify data integrity after that powerloss.

**k1e0x** · 12 March 2021, 01:19 PM

Originally posted by flower View Post

jbod is not raid. hba's are still widely used and a necessity.
hardware raid controllers just introduce an additional point of failure and make recovery more painful (eg you need to keep another one around).

btw onboard raid controllers are even worse. the ones found on consumer hardware are just software raid anyway (with the drawbacks of hw raid controllers).

real software raid is just good enough (pure mdadm / zfs at least). only problem is that you need more bandwith (raid10/4 disks nvme hw raid controller can use 8 lanes, raid10/4disk sw raid needs 16 for full performance)

sure there are still people using them. but i wouldnt advise anyone to build a new system with it.

i cant comment on your scenario as it depends on to many factors. i wouldnt use one or two personal events to decide which is better.

I agree with this, the worst part of hardware raid controllers is the firmware. It's something you have no control over.

HBA's are the way to go.

**gilboa** · 12 March 2021, 02:29 PM

Originally posted by Space Heater View Post

How did you verify the integrity of the data if you were using XFS and a RAID controller? As far as I'm aware XFS does not checksum data (only metadata).

We verified the checksum of a couple of VMs against a last-good-known backup + we fschked all the VMs using their respective FS tool(s).
Granted, this did not protect us against single bit errors / bit-flip, but given the fact the XFS + HW RAID Gluster cluster had literally zero errors while the ZFS pool Gluster cluster nearly crashed completely, I'd call it a win.

That said, please keep in mind that the original argument was not XFS vs. ZFS, it was ZFS SW RAID vs. HW RAID with a beefy battery backed cache...

- Gilboa

**gilboa** · 12 March 2021, 02:48 PM

Originally posted by flower View Post

jbod is not raid. hba's are still widely used and a necessity.

Mis-wording on my end. I meant HBA.

hardware raid controllers just introduce an additional point of failure and make recovery more painful (eg you need to keep another one around).

Not much of an issue if you have rack(s) full of HP DLXXX or Dell RXXX with 24x7 4H (or even NBD) support.
Plus, with over 100 active servers, I've only seen one (1) HW RAID controller fail, and even this one was due to gross mishandling by a customer.

btw onboard raid controllers are even worse. the ones found on consumer hardware are just software raid anyway (with the drawbacks of hw raid controllers).
real software raid is just good enough (pure mdadm / zfs at least). only problem is that you need more bandwith (raid10/4 disks nvme hw raid controller can use 8 lanes, raid10/4disk sw raid needs 16 for full performance)

There are two unrelated points:
1. Most mid-to-high end servers rarely use "firmware" RAID. They usually ship with a real hardware RAID w/ cache and battery backup.
2. Hardware RAID controllers are not designed to handle NVME. (On the other hand, A. Anything beyond RAID10/1/0 on NVME will simply slow the NVME drives. B. There's little reason, if any to do write caching on NVME).

In short, if you're doing RAID5/6 and/or the respective 50/60, especially with rotating rust and/or SATA SSDs, you'd be wise to consider using HW RAID with a large battery back cache.
If you're doing RAID0/1/10, software RAID should be fine.

sure there are still people using them. but i wouldnt advise anyone to build a new system with it.

Don't get me wrong. I'm typing this on workstation with LVM + MDRAID10 (Rotating rust + SSDs), all my VMs are on an oVirt server w/ backup machine, both using LVM + MDRAID6 (Rotating rust).

I __love__ software RAIDs, but rather use HW RAIDs (again with battery backed cache) in production whenever possible.

- Gilboa

**waxhead** · 02 April 2021, 07:10 AM

Originally posted by cynic View Post

yup! btrfs raid, on the other side, does protect you against data corruption too.

Yes, I am aware that BTRFS in any redundant storage configuration ("RAID1",10,5 or 6) does protect against corruption, but BTRFS "RAID" is not really RAID in the traditional sense as I wrote later in my comment. BTRFS should never have used the term "RAID" in my opinion which is precisely why you wrote what you did.

**cynic** · 02 April 2021, 07:48 AM

Originally posted by waxhead View Post

Yes, I am aware that BTRFS in any redundant storage configuration ("RAID1",10,5 or 6) does protect against corruption, but BTRFS "RAID" is not really RAID in the traditional sense as I wrote later in my comment. BTRFS should never have used the term "RAID" in my opinion which is precisely why you wrote what you did.

indeed, the naming might be misleading. I'd call RAID++ since it is way better than classic RAID

Announcement

Btrfs Will Finally "Strongly Discourage" You When Creating RAID5 / RAID6 Arrays

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment