Announcement

**oiaohm** · 20 February 2019, 05:51 PM

Originally posted by gbcox View Post

You need to re-read my post in context with the reply for which it was intended.

Take a close look at the problems.

Btrfs is mainline Linux so could it be modified so that snapshots in btrfs appear as LVM volumes the answer is yes.

Checksum offloading this is a problem xfs is pushing down into the block layer. Is this something btrfs could be modified in time to exploit yes.

ZFS biggest problem is not mainline.

By the way debian installer is also designed that you can choose to install to btrfs. This does lead to annoying btrfs drivers loaded when you are not using btrfs. So btrfs is not out the running but it will need to address it problems.

Buy the way xfs is really late to the cow process but it history for being trustworthy has Redhat attention.

If zfs does not get mainline long term will lose because long term it will not have hardware support to perform.

Bcachefs does not have any major distributions with it in installer at all.

The race has come 4 ways.
btrfs SUSE and other distributions as secondary install options.
zfs with Ubuntu install option and mostly that is it.
Xfs with Redhat related distributions and this is a long term partnership.
And the new kid bachefs.

**starshipeleven** · 20 February 2019, 08:01 PM

Originally posted by oiaohm View Post

Checksum offloading this is a problem xfs is pushing down into the block layer. Is this something btrfs could be modified in time to exploit yes.

Wait, I was under the impression that btrfs had checksum acccelerator support. It may not use a RAID card's accelerators, but CPUs and SoCs do have accelerators for CRC32 algo and a linux driver (which is used by btrfs checksums, along with most integrity checks of ethernet packets and other stuff)

[SOLVED] Checksum (crc32c-intel) hardware acceleration / Newbie Corner / Arch Linux Forums

https://bbs.archlinux.org/viewtopic.php?id=232119

**starshipeleven** · 20 February 2019, 08:08 PM

Originally posted by waxhead View Post

btrfs does support DUP profile for both data and metadata. Btrfs also tries to place each copy apart from eachother physically on the drive. This increases the chance that one copy is still good in case of a physical defect.

Ok for hdds, not ok for SSDs where the controller will try to bunch it together as it is the same data.
Still technically a RAID1, on the same drive. Means you double the space used. Something like par2drive (par2 tool) wastes much less space as it does not store a full copy of the file while it's still able to fix the whole file for to X blocks depending on how much parity information you decide to allocate. Realistic corruption isn't as massive as requiring a full copy, 10% parity is plenty.

Scrub does read both copies.

Can you tell me where I can confirm this? I must have missed the mail for that addition in the mailing list.

Normal read only read one copy which is good for performance.

Normal read has significantly lower performance than it woul have if it read from both at the same time.

**oiaohm** · 20 February 2019, 08:48 PM

Originally posted by starshipeleven View Post

Wait, I was under the impression that btrfs had checksum acccelerator support. It may not use a RAID card's accelerators, but CPUs and SoCs do have accelerators for CRC32 algo and a linux driver (which is used by btrfs checksums, along with most integrity checks of ethernet packets and other stuff)

It does but there are some serous good reasons to push it down to the block layer. Btrfs uses the accelerators where zfs does not but this is not exactly the problem.

Removing chesksum work out of the file system driver and putting in the block device provides some advantages.

Off loading the problem to the block device gives the following differences.

By having checksum in block device instead of file system driver allows for loop back to be done more sane.

Think btrfs file system with a file on it looped back to contain btrfs. Now you are checksumming twice. Accelerators can only do so much working twice as hard as you need to is not good. Xfs case will only be performing the checksum once for this case this is why one of the features xfs is working on is xfs on xfs and other file systems as well so that stacking file systems is clean.

Also its not just loopback its like virtio block device where the hypervisor host could be performing the checksum.

Performing the checksum as part of the file system driver leads to the nightmare where will be in lots of different cases be performing the same checksum multi times.

https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt

Yes btrfs sitting on dm-integrity block device is going to have crc32 performed twice once by btrfs and once by dm_integrity.

Its not just accelerating the processing of checksums its keeping how many times you checksum the same block of data read from disc to the min number of times required.

**starshipeleven** · 20 February 2019, 11:33 PM

Originally posted by oiaohm View Post

Think btrfs file system with a file on it looped back to contain btrfs. Now you are checksumming twice.

I don't think it is a valid complaint.
In this case, doing CoW twice (once in the host and once inside the loop device) is going to slaughter performance already.

You will 100% want a nodatacow mount option for the filesystem inside the loop device, (which disables CoW, checksumming and compression, neither is something you would want inside a loop device on a normal btrfs filesystem).

Also its not just loopback its like virtio block device where the hypervisor host could be performing the checksum.

Same as above.

Yes btrfs sitting on dm-integrity block device is going to have crc32 performed twice once by btrfs and once by dm_integrity.

Seriously what the fuck?
Why would anyone want to put a btrfs filesystem on a dm-integrity block device to begin with? What's the benefit?
Does the filesystem need to account for retarded users now?

I mean, what's stopping a moron from stacking a dm-integrity on another dm-integrity block device for maximum retardation?

**oiaohm** · 21 February 2019, 04:14 AM

Originally posted by starshipeleven View Post

In this case, doing CoW twice (once in the host and once inside the loop device) is going to slaughter performance already.

This is you not understanding problem.

Originally posted by starshipeleven View Post

You will 100% want a nodatacow mount option for the filesystem inside the loop device, (which disables CoW, checksumming and compression, neither is something you would want inside a loop device on a normal btrfs filesystem).

This is a NOGO for redhat. You want to be able to on loop back run a cow file system and that loop back file be on a cow file system. Because how will the OS on the loopback have snapshots to roll back updates failures automatically otherwise.

This is why btrfs or zfs does not fly for redhat.

Originally posted by starshipeleven View Post

Why would anyone want to put a btrfs filesystem on a dm-integrity block device to begin with? What's the benefit?
Does the filesystem need to account for retarded users now?

You might call it retarded but this is lack of flexibility on the btrfs.

Originally posted by starshipeleven View Post

I mean, what's stopping a moron from stacking a dm-integrity on another dm-integrity block device for maximum retardation?

Exactly nothing stops them from stacking like this. This is party why block device layer information has to extend. Because it might not be dm-integrity on dm-integrity it might be dm-intergerty on raid controller doing the same thing. Finally btrfs sitting on a raid controller with partition in a raid formatted partition is also already checksum-ed.

If you can see properly what features are in the block layers and file system layers under the file system you can avoid duplicating processing unless it truly required so avoiding most of the massive performance kill of cow file system on cow file system.

Like the two dm-integrity stacked the checksum written into both should be absolutely identical for each block.

This is the route xfs development is following. This is a very different route to btrfs or zfs.

**Chugworth** · 21 February 2019, 04:27 AM

Originally posted by PuckPoltergeist View Post

Why is CoW a main reason to use Btrfs for a VM image?

Well you need CoW for snapshots. Just take a snapshot, then use the filesystem's send ability to transfer it to a remote server, and you have a quick and easy backup system. What happens in Btrfs if you set a file as NOCOW and take a snapshot? Any further changes after the snapshot would have to be written to new areas of the drive, so wouldn't that setting basically be useless then? The documentation I have found is not real clear on that, but it is clear that you lose checksumming for NOCOW files.

Originally posted by oiaohm View Post

If zfs does not get mainline long term will lose because long term it will not have hardware support to perform.

Lose what exactly? ZFS will remain a well-polished, reliable filesystem, and its success isn't tied to Linux alone. Long term, you have no idea what will happen. Maybe Bcachefs will gain features and become the Linux favorite, and then someone else will come along and design an even better filesystem.

**starshipeleven** · 21 February 2019, 05:45 AM

Originally posted by oiaohm View Post

This is a NOGO for redhat. You want to be able to on loop back run a cow file system and that loop back file be on a cow file system. Because how will the OS on the loopback have snapshots to roll back updates failures automatically otherwise.

nodatacow does not disable snapshots, even if it should not be able to do snapshots if it's not CoW.
This is because btrfs is flexible where it actually matters.

If you ask to make a snapshot in a nodatacow volume then btrfs will temporarily act as a CoW filesystem for the sake of making the snapshot, then it will return no-CoW.

https://btrfs.wiki.kernel.org/index....data_blocks.3F

Disable it by mounting with nodatacow. This implies nodatasum as well. COW may still happen if a snapshot is taken.

This is why btrfs or zfs does not fly for redhat.

I doubt that RedHat didn't fully understand that you can still snapshot a nodatacow volume.

You might call it retarded but this is lack of flexibility on the btrfs.

No seriously, fuck off. I don't see why anything should support plain retarded setups.

block device layer information has to extend. Because it might not be dm-integrity on dm-integrity it might be dm-intergerty on raid controller doing the same thing.

How can we know anything about what a closed source raid controller is doing, how you plan to even have the block layer in the kernel know this.
Why you should even care since that is done with the raid controller's own resources anyway.

Finally btrfs sitting on a raid controller with partition in a raid formatted partition is also already checksum-ed.

The issue here is that this kind of checksumming is opaque. You don't know wtf is going on in the raid controller, how well it checksums, even if it checksums at all.

If you can see properly what features are in the block layers and file system layers under the file system you can avoid duplicating processing unless it truly required so avoiding most of the massive performance kill of cow file system on cow file system.

It takes only 3 seconds of using your brain to not do retarded bullshit like placing a btrfs on a dm-integrity or to know that you need to disable the CoW in the VMs or loop devices you keep on a btrfs filesystem.

I don't see the value of integrating all this logic in a file system or in the block layer in the kernel.

**starshipeleven** · 21 February 2019, 05:48 AM

Originally posted by Chugworth View Post

The documentation I have found is not real clear on that

see above

**gbcox** · 21 February 2019, 08:43 AM

Originally posted by oiaohm View Post

Take a close look at the problems.

Btrfs is mainline Linux so could it be modified so that snapshots in btrfs appear as LVM volumes the answer is yes.

Checksum offloading this is a problem xfs is pushing down into the block layer. Is this something btrfs could be modified in time to exploit yes.

ZFS biggest problem is not mainline.

By the way debian installer is also designed that you can choose to install to btrfs. This does lead to annoying btrfs drivers loaded when you are not using btrfs. So btrfs is not out the running but it will need to address it problems.

Buy the way xfs is really late to the cow process but it history for being trustworthy has Redhat attention.

If zfs does not get mainline long term will lose because long term it will not have hardware support to perform.

Bcachefs does not have any major distributions with it in installer at all.

The race has come 4 ways.
btrfs SUSE and other distributions as secondary install options.
zfs with Ubuntu install option and mostly that is it.
Xfs with Redhat related distributions and this is a long term partnership.
And the new kid bachefs.

Again... You need to re-read my post in context with the reply for which it was intended. I don't get into discussions with a strawman.

Announcement

XFS Copy-On-Write Support Being Improved, Always CoW Option

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment