Announcement

**blackiwid** · 11 September 2014, 08:20 PM

why again would I use that over btrfs? Its userspace, and even the name is a lie, its a zfs fork.

userspace filesystems are ok for compatibility if you want to read a ntfs disk or something like that, but to build something on it, I dont get it.

Sorry had to be the first troll

**ryao** · 11 September 2014, 08:27 PM

Originally posted by blackiwid View Post

why again would I use that over btrfs? Its userspace, and even the name is a lie, its a zfs fork.

userspace filesystems are ok for compatibility if you want to read a ntfs disk or something like that, but to build something on it, I dont get it.

Sorry had to be the first troll

I think that you have OSS projects confused. ZFSOnLinux and ZFS-FUSE are separate projects. ZFSOnLinux is a kernel space driver while ZFS-FUSE is a user space driver. The ZFS-FUSE project's developers discontinued development when ZFSOnLinux surpassed ZFS-FUSE, so you do not hear about ZFS-FUSE anymore.

That said, your btrfs question was already addressed on hacker news:

https://news.ycombinator.com/item?id=8303333

**ryao** · 11 September 2014, 08:34 PM

Originally posted by ryao View Post

The IOMMU is a technical term that AMD uses while VT-d is Intel's marketing term. It does need to be enabled in the BIOS. You can check dmesg to see if it is on. It will sometimes break poorly written device drivers.

This did not quite answer the question. I will have to write more on this tomorrow. I have been responding to questions since the blog post launched and I am ready to call it a day.

**extide** · 11 September 2014, 10:17 PM

Been rockin a ZoL install here at home for about 2 years now. Had a few drives fail, swapped in some new ones, etc, everything has been absolutely perfect!

**kernelOfTruth** · 11 September 2014, 10:46 PM

Originally posted by ryao View Post

The IOMMU is a technical term that AMD uses while VT-d is Intel's marketing term. It does need to be enabled in the BIOS. You can check dmesg to see if it is on. It will sometimes break poorly written device drivers.

As for the NCQ, there should be no reason to disable it. There exist block devices that reorder across flushes. If an AHCI or SCSI drive is one of them, its firmware is broken. It is typically RAID that will reorder across flushes, but that is not the only place where it can happen.

FUA is another animal. ZFS relies on flushes rather than FUA to ensure consistency. I am thinking of modifying ZIL to utilize it on slog devices, but drives that ignore FUA are a concern for that.

thank you

Originally posted by ryao View Post

This did not quite answer the question. I will have to write more on this tomorrow. I have been responding to questions since the blog post launched and I am ready to call it a day.

alright, looking forward to more explanation

another thing:

you mentioned issues with e.g. dm-crypt but in conjunction with zpools on zpools, ok - sounds like some exotic thing

how about concerned desktop users such as myself:

from bottom (close to harddrive) to top (ZFS):

ZFS
cryptsetup/dm-crypt

or

ZFS
LVM
cryptsetup/dm-crypt

how well are flushes working with ZFS and cryptsetup ?

I know that it has been an issue in the past with the in-kernel linux filesystems

but (fortunately) haven't heard anything serious (yet) about ZFS in conjunction with device-mapper

Many thanks in advance

**jabl** · 12 September 2014, 01:58 AM

Originally posted by kernelOfTruth View Post

what role does NCQ play in the guaranteed ordering of data ?

None. NCQ requests are unordered, just like SCSI TCQ (well, SCSI does optionally have ordered tag semantics, but nothing supports that due to problems with the spec etc.).

does it have to be disabled ? (queue set to 1)

No.

most if not all harddrives driven by S-ATA don't support flush/FUA (Write cache: enabled, read cache: enabled, doesn't support DPO or FUA) - so this means the writecache has to be disabled to guarantee the order and thus integrity of data ?

No.

So the typical situation you have is that you have writes A, B, C that you want to be on disk before doing a final write X which you also want to be on stable storage before you can say some transaction is completed (think journal commit record, or the root block in a COW fs). So the way to do that in Linux is that the filesystem issues writes A, B, C (can all be in flight concurrently due to NCQ/TCQ), then waits for them to complete. Then a cache flush is issued which guarantees that A, B, C are on stable storage rather than in the drive cache. Then finally write X is issued with the FUA bit set (force unit access). If FUA support is missing, the Linux kernel block layer emulates a (writeFUA, wait for completion) sequence by instead issuing a (normal write, wait for completion, cache flush) sequence.

Disclaimer: The above is how the Linux block layer, and how common Linux filesystems such as ext4, XFS, btrfs, work. I don't know if, and if so to which extent, ZoL does things differently.

**blackiwid** · 12 September 2014, 10:41 AM

Originally posted by ryao View Post

That said, your btrfs question was already addressed on hacker news:

https://news.ycombinator.com/item?id=8303333

Yes maybe I dont care to much about zfs, so I dont follow the news around it to closely. Even on the Post u link to there are major problems listet against ZFS, not even mentioning patent issues potential.

- Memory. I've got out of memory errors because ARC memory is not freed fast enough. Also ZFS hogs a lot of memory. At least 750MB to 1GB kernel memory with several disks.
- It's not native Linux. You have some Solaris Porting layer modules and integration into the Linux kernel is not as tight as you may like it to be.
- btrfs will be supported and be a default choice in 1-2 years in likely all major Linux enterprise distributions.

Maybe there are some advantages for hyper-konservative peopel that need extremly good checksums, that not in 1 of 1mio harddiscs 1 file get corrupted undetected, or something like that. But even that problem will maybe addressed in the future. Or something else. But at least for Linux DESKTOPS this fs makes 0 sense, and when people could live for 30 years with xfs and ext2 or ext3 or ext4 for servers, with backups and so on, maybe small inperfection with not so perfect checksums they can also live for some years.

For me the last point is what is important, when it is the default distribution and its IN the kernel, its very unlikely that this code ever goes into vanilla kernels, I am shure all small problems will be addressed.

Another point: this hacker news site looks liek there are only zfs fans that wrote there, at least many disadvantages of ZFS did not got mentioned there. I am no fs expert, but I found googling for 10 seconds many disadvantages of zfs that are not listet there, as example:

- ZFS lets you specify compression and other properties per file system subtree
BTRFS allows these options to be specified on a per-file basis.
- ZFS distinguishes snapshots from file systems
This seems to be an issue with the author's experience with BTRFS. A snapshot does not have to be a peer with its original. Snapshots, like any subvolume, can be given a destination.

My case for BTRFS over ZFS - Blog - My happy place

http://drdabbles.us/journal/2014/2/15/my-case-for-btrfs-over-zfs.html

The computer industry always finds itself mired in FUD and heated debate over what can only be des...

the deduplication implemtion seems to be even a kind of antifeature, because the btrfs implementation does the job good enough but u dont need a core i7 and 8gb ram to drive that system.

So again for desktop users btrfs makes in no way sense not as your linux root, not for a small nas. only if u have your own professional fileserver with customers zfs is even viable not that btrfs is completly out of the race then, maybe today sometimes, but even today it would be for much of this more proffessional tasks good enough. again there were people that could manage to live with ext2 or ext4 for such tasks for centuries.

And for the maybe small advantages it has if u manage with big cpus and much ram and much time to get around its disadvantages, I think in 1 year most of it should be addressed if needed.

But yes maybe there are some good cornercases at least for now its better, and maybe it stays that way, again I doubt that but we will see. But for Linux Desktop users I dont see any future for zfs.

I find it just funny because dedublication works in btrfs too, and even if not u maybe have to buy 1-2hard disks more but with zfs u need for the same tasks a much bigger cpu taht consumes much more power and much more ram taht do the same and should cost more.

But maybe for people that dont care about power consumption and are very angsiest of loosing maybe file that they only open every 10 years so it is not in a good state in the backup too, it is worth... lets hope btrfs never gets better checksum

**ryao** · 13 September 2014, 12:51 PM

Originally posted by kernelOfTruth View Post

thank you

alright, looking forward to more explanation

another thing:

you mentioned issues with e.g. dm-crypt but in conjunction with zpools on zpools, ok - sounds like some exotic thing

how about concerned desktop users such as myself:

from bottom (close to harddrive) to top (ZFS):

or

how well are flushes working with ZFS and cryptsetup ?

I know that it has been an issue in the past with the in-kernel linux filesystems

but (fortunately) haven't heard anything serious (yet) about ZFS in conjunction with device-mapper

Many thanks in advance

Originally, userland programs were able to read/write to any location in system memory. The introduction of virtual memory enabled by Memory Management Units changed that and had rather significant benefits in terms of securing the integrity of one programs' memory from accidential (or intentional) tampering from another program. Direct Memory Access has traditionally permitted a similar situation with devices and analogously, the IOMMU serves a similar role. In order to use it, you need hardware that supports an IOMMU, it must be enabled in your BIOS and your kernel must support it. You can check to see if it was enabled by inspecting the output of dmesg for information about an IOMMU. ZFS' data integrity benefits from this because it has the effect of eliminating a vector through which its in-kernel data structures could be corrupted.

As far as cryptsetup is concerned, there were reports of problems, but they have been almost entirely eliminated since 0.6.3 was released. The reason why people had issues is not known, but I can say that I have only had 1 report of an issue since 0.6.3 was tagged and those who had problems no longer report issues. That said, I prefer to recommend ecryptfs to those interested in encryption at this time.

**ryao** · 13 September 2014, 01:01 PM

Originally posted by blackiwid View Post

Yes maybe I dont care to much about zfs, so I dont follow the news around it to closely.

This in itself prevents you from making actionable criticism and quite frankly, your post reads as an attack post, but I will say one thing about memory management. The out of memory issues that people report occur only when making large allocations (e.g. 20% of system memory) because of Linux's internal memory accounting. In specific, the function __vm_enough_memory() that is involved in overcommit will refuse to permit allocations to occur if those allocations would require dipping into the kernel's emergency memory pools:

linux/mm/mmap.c at master · torvalds/linux

https://github.com/torvalds/linux/blob/master/mm/mmap.c#L134

Linux kernel source tree. Contribute to torvalds/linux development by creating an account on GitHub.

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

In general, the kernel is good at freeing memory in advance and people do not see it in practice, but it can happen independently of any filesystem that people use. In the case of ZFS, the implementation of ARC using virtual memory allocated SLAB buffers make it slightly more likely that people will encounter this because memory reclaimation is somewhat slower. Improvements to resolve this are under development. Another contributor wrote patches that are reported to make this problem disappear by those testing:

Move ARC data buffers out of vmalloc by tuxoko · Pull Request #2129 · openzfs/zfs

https://github.com/zfsonlinux/zfs/pull/2129

_Warning_ This patchset requires further testing and review. DO NOT USE IN PRODUCTION. Introduce ABD: linear/scatter dual typed buffer for ARC zfsolinux currently uses vmalloc backed slab for ARC ...

However, the approach he took is suboptimal because it introduces additional copying on data. The approach that the project would prefer is to restrict additional copies to metadata, which is believed to require less CPU time. The latter will take longer to develop, but it is most likely what will be merged. Resolving this is a priority for the project and it will likely be resolved within the next 6 months.

With that said, I suggest that you try ZFS. I believe that you would find that it is a joy to use and if you encounter actual issues that affect you, please do not hesitant to file an issue to report them to the project. We are working very hard to resolve issues that are reported to us with the result that the software improves with each release.

**blackiwid** · 13 September 2014, 05:59 PM

Originally posted by ryao View Post

With that said, I suggest that you try ZFS. I believe that you would find that it is a joy to use and if you encounter actual issues that affect you, please do not hesitant to file an issue to report them to the project. We are working very hard to resolve issues that are reported to us with the result that the software improves with each release.

Of course I troll a bit, I said that I will do that. But there is one thing I would have to do and not only once but on every system I want to use it in years from now on, it is I would have to patch the kernels and build it by myself.

The main features of BTRFS for me, why I use it, is that I dont have to use stupid partition tables lvm and swraid to get things done. And even if I use this tools I have to manually shrink and grow filesystems, what I dont need to do with systemd, some other features like better backup encryption for the higher mb/s speed and so on are some bonuses to that.

But I dont have some TB disks I dont have that much data to save...

Thats the reason that for me btrfs has won for most users, but its ok to have a alternative

It just drives me somethimes cracy how proud this bsd guys are and how elite they feel because they are maybe 1 year ahead with their filesystem, and that they do so that its 1000x better than btrfs and even that bsd is better than linux because of that fs, except beside that fs its pretty shitty not usable for desktop.

Sorry have to troll a bit, but saying that it gets someday fixed does not make anything better, then I could also say that to every issue btrfs has.

EXAMPLE OF THAT:
(from the FAQ)

Currently Btrfs uses crc32c for data and metadata. The disk format has room for 256bits of checksum for metadata and up to a full leaf block (roughly 4k or more) for data blocks. Over time we'll add support for more checksum alternatives.

that was the biggest critic I found in the link against btrfs. everything else was pretty weak arguments in my opinion.

Announcement

ZFS On Linux Is Called Stable & Production Ready

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment