Announcement

**Chugworth** · 21 February 2019, 10:34 AM

Originally posted by starshipeleven View Post

see above

Yes, I did notice that. But I thought their wording is kind of odd.

"COW may still happen if a snapshot is taken"
should be
"COW absolutely happens if a snapshot is taken"

Basically any change you make after the snapshot is going to be written to a different location, so you lose the benefit of the NOCOW setting. I suppose if data is changed after snapshot, then any further changes to that data could be modified rather than copied. But that wouldn't matter much since the data is is already fragmented.

**starshipeleven** · 21 February 2019, 01:00 PM

Originally posted by Chugworth View Post

Basically any change you make after the snapshot is going to be written to a different location, so you lose the benefit of the NOCOW setting.

I'm not sure about what you mean with "you lose the benefit of NOCOW", no you don't.

The only way to do a snapshot it is to work like it does when it is CoW, so go and write the changes somewhere else, and from then on any change goes to this new place instead of the old place that is read-only from now on (in the sense that until you don't change these files you are still reading them from the snapshot, there is no copy).

I suppose if data is changed after snapshot, then any further changes to that data could be modified rather than copied. But that wouldn't matter much since the data is is already fragmented.

This happens in any case when you make a snapshot, as you freeze in place data, and then start recording only the changes from this frozen state, while still referencing it for the non-modified data. There is nothing you can do differently if you want online snapshots.

The only difference is that if you have the "nodatacow" option enabled, then it will not do CoW on further changes to already modified data.

**waxhead** · 21 February 2019, 02:49 PM

Originally posted by starshipeleven View Post

Ok for hdds, not ok for SSDs where the controller will try to bunch it together as it is the same data.
Still technically a RAID1, on the same drive. Means you double the space used. Something like par2drive (par2 tool) wastes much less space as it does not store a full copy of the file while it's still able to fix the whole file for to X blocks depending on how much parity information you decide to allocate. Realistic corruption isn't as massive as requiring a full copy, 10% parity is plenty.

True, for SSD's that attempts deduplication the DUP profile may not necessarily solve it.
As for parity - well you need to checksum that as well if you want to know if the parity is good or not. I have not done the math , but I imagine things like reed solomon codes similar to what are used on CD's have been considered. For harddrives , you typically corrupts a fairly large portion of data (in case of a physical defect) so I at least would feel better with a full copy on a small array of perhaps only two storage devices.

Originally posted by starshipeleven View Post

Can you tell me where I can confirm this? I must have missed the mail for that addition in the mailing list.

Sure... Regarding scrub:

Manpage/btrfs-scrub - btrfs Wiki

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub

"will read all data and metadata blocks from all devices and verify checksums"

And it has always been like that as far as I know. Looking at the kernel source for 3.16 has the same notice...

scrub.c « btrfs « fs - kernel/git/stable/linux.git - Linux kernel stable tree

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/scrub.c?h=v3.16.63#n34

Note that scrub does NOT check empty space which may or may not be such a good idea depending on the use case. Perhaps a confusion? Is this what you really meant maybe?!

Originally posted by starshipeleven View Post

Normal read has significantly lower performance than it woul have if it read from both at the same time.

Reading from both drives in a RAID1 like setup is actually not such a good idea. MDRAID does not do it either check out "man md" and read under RAID1. As you will see that a n-way RAID1 is only useful when you got multiple threads. BTRFS is pretty similar. It distributes reads by pid%num_devices which in a 2 device setup could mean that if you have 10 threads with the PID ending on 0 then all of them might read from the same disk while the other disk remains idle. Yes this sucks in theory , but works well in practice. There have been quite a few patches trying to improve this by looking at the device queue but they have not yet been merged. I personally think that the device bandwidth is more important. So perhaps there will be a patch in the future that considers the device bandwidth when distributing read and writes ... and as we all know reads and write performance can be a bit different.

**oiaohm** · 21 February 2019, 07:16 PM

Originally posted by Chugworth View Post

Lose what exactly? ZFS will remain a well-polished, reliable filesystem, and its success isn't tied to Linux alone. Long term, you have no idea what will happen. Maybe Bcachefs will gain features and become the Linux favorite, and then someone else will come along and design an even better filesystem.

Is ZFS well polished for Linux no. Did you miss where ZFS lose its method of accelerating chksums under Linux. Or do you miss that ZFS is having a high ram usage under Linux due to duplicating caching systems.

The reality here is a hard one. Without being mainline Linux kernel heck any OS kernel they support the result will be at times slow performance and high ram consume.

Basically the clock is ticking on ZFS. The day a competitor has enough features and is cross platform enough but that competitor is mainline Linux will see ZFS loss quickly. Only move to prevent this is mainline..

**oiaohm** · 21 February 2019, 07:26 PM

Originally posted by starshipeleven View Post

nodatacow does not disable snapshots, even if it should not be able to do snapshots if it's not CoW.
This is because btrfs is flexible where it actually matters.

It disables what enterprise calls a snapshot. Nocow disabling checksums means that when you roll back to a snapshot its now a leap of faith.

The reality is disabling cow and disabling checksums should be independent with btrfs they are not. Snapshot lacking proper validation does not count. This is why Redhat wants to see down to the lower layers. If stuff is validated by checksums then is valid to be skipping over doing checksums. Lets say you have a system image made nodatacow and it transferred between hosts. One host can have block level checksum on and the other host does not. See nodatacow disabling checksums comes a path to extra human error because of this design error. Its better to make the block layer able to be queryed and based on query alter settings.

**starshipeleven** · 21 February 2019, 07:50 PM

Originally posted by oiaohm View Post

It disables what enterprise calls a snapshot. Nocow disabling checksums means that when you roll back to a snapshot its now a leap of faith.

The host btrfs filesystem you have put this loop device in is still CoW and checksumming, so no it is not a leap of faith. Grasping at straws?

Redhat didn't go for btrfs because their clients quite frankly didn't give a shit, and wanted easy-mode tools to manage their servers storage instead. I can't blame them, I wouldn't mind a single interface to control mdadm LVM, crypto block devices and the filesystem at once.

**Chugworth** · 21 February 2019, 09:39 PM

Originally posted by oiaohm View Post

Is ZFS well polished for Linux no. Did you miss where ZFS lose its method of accelerating chksums under Linux. Or do you miss that ZFS is having a high ram usage under Linux due to duplicating caching systems.

The reality here is a hard one. Without being mainline Linux kernel heck any OS kernel they support the result will be at times slow performance and high ram consume.

Basically the clock is ticking on ZFS. The day a competitor has enough features and is cross platform enough but that competitor is mainline Linux will see ZFS loss quickly. Only move to prevent this is mainline..

That high ram consumption is one of the things that I find attractive about ZFS. That's the ARC cache, which works differently from the kernel caching, and it's how ZFS deals with CoW fragmentation. I run a FreeNAS server that has 16 GB of memory, and the ARC cache often sits around 12 GB in size. That's 12 GB of frequently accessed data that the drives don't have to seek each time. Of course, if the system needs more memory, then the ARC cache will shrink. If the ZFS license were to change to GPL tomorrow and it went into the Linux kernel, ARC caching would come right along with it.

As for the accelerated checksums, I'm not overly concerned about that. The ZoL developers have been working on that. And even if it does hurt performance in most distributions, I imagine Ubuntu could adjust the kernel so that it gets the access it needs. If not, and the performance of ZoL is just simply harmed, then I'll use more FreeBSD until the Linux world can come up with a better filesystem. If they do, then I'm thrilled. There are some things about ZFS that annoy me, like the lack of reflinks support. I believe Oracle ZFS has already implemented that. But OpenZFS hasn't, and there doesn't appear to be any plans to. But at the moment ZFS is what appears to be the overall best option for file storage.

**oiaohm** · 22 February 2019, 09:20 PM

Originally posted by starshipeleven View Post

The host btrfs filesystem you have put this loop device in is still CoW and checksumming, so no it is not a leap of faith. Grasping at straws?.

When you image end up on a cloud provider it is a leap of faith that CoW and checksumming will be provided by the host if you have not checked.

Originally posted by starshipeleven View Post

Redhat didn't go for btrfs because their clients quite frankly didn't give a shit, and wanted easy-mode tools to manage their servers storage instead. I can't blame them, I wouldn't mind a single interface to control mdadm LVM, crypto block devices and the filesystem at once.

Also there virtual machines to perform and have snapshotting and checksums when it configured that way without extra overheads.

**oiaohm** · 22 February 2019, 09:49 PM

Originally posted by Chugworth View Post

That high ram consumption is one of the things that I find attractive about ZFS. That's the ARC cache, which works differently from the kernel caching, and it's how ZFS deals with CoW fragmentation. I run a FreeNAS server that has 16 GB of memory, and the ARC cache often sits around 12 GB in size. That's 12 GB of frequently accessed data that the drives don't have to seek each time. Of course, if the system needs more memory, then the ARC cache will shrink. If the ZFS license were to change to GPL tomorrow and it went into the Linux kernel, ARC caching would come right along with it.

ARC cache in ZOL is solaris pagecache put into Linux and not integrated with everything else. So of course arc cache works differently it an alien cache put in your OS. ARC cache is unlikely to be accepted mainline. You don't have proper memory deduplication between block caches and the arc cache as you have between the block caches and the linux standard page cache.

This need for deduplication comes important https://www.redhat.com/en/blog/look-...pression-layer particularly as this kind of stuff is appearing at block layer.

Btrfs and Xfs is able to use the standard linux page cache to deal with CoW fragmentation.

The future of the page cache [LWN.net]

https://lwn.net/Articles/712467/

The was a time frame when the Linux page cache was thought it was going to be able to be got rid of but that has proven to be impossible. Yes bring the ARC cache across at that time gave ZFS on Linux higher DAX performance because it had a pagecache and normal Linux file systems were attempting to avoid the pagecache.

Originally posted by Chugworth View Post

As for the accelerated checksums, I'm not overly concerned about that. The ZoL developers have been working on that. And even if it does hurt performance in most distributions, I imagine Ubuntu could adjust the kernel so that it gets the access it needs. If not, and the performance of ZoL is just simply harmed, then I'll use more FreeBSD until the Linux world can come up with a better filesystem. If they do, then I'm thrilled. There are some things about ZFS that annoy me, like the lack of reflinks support. I believe Oracle ZFS has already implemented that. But OpenZFS hasn't, and there doesn't appear to be any plans to. But at the moment ZFS is what appears to be the overall best option for file storage.

reflinks works in btrfs and xfs, This is also another problem. XFS and btrfs are ahead in some features compared to ZFS. Now when btrfs and xfs they catch up on the other features there will be no point to ZFS.

**dfyt** · 23 February 2019, 06:13 PM

One of my sites has regular power failures. When using BTRFS as a HOME folder on desktops I had endless corruption issues. Folders that couldn't be deleted. Data that went missing, normally in the folders I couldn't delete. Locked files that couldn't be deleted etc. Hence data loss.

Installed XFS there and no more issues. Except that XFS needs a proper undelete app. I WISH BTRFS was more reliable so I could rid of my annoying, heavily fragmented and limited in size ZFS pools.

Announcement

XFS Copy-On-Write Support Being Improved, Always CoW Option

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment