Announcement

**oiaohm** · 15 January 2020, 01:42 AM

Originally posted by allquixotic View Post

the ability to efficiently use tiered storage.

This is the problem.

Originally posted by allquixotic View Post

You would think that Linux would have a stable, mature, tested, highly optimized filesystem in-house for handling Tiered Storage properly, but it actually doesn't. Not at all. None of the solutions available with Btrfs, XFS, Ext4, LVM2, MD, and family even come close to the performance and feature-set of ZFS with tiered storage. Not to mention that the closest feature competitor, Btrfs, is still such a boondoggle stability-wise that Red Hat is abandoning it as a supported filesystem in RHEL. They also don't have any engineers to work on it, but if it were stable, they wouldn't need to.

Now you are in trouble with logic. RHEL dropped Btrfs but did not give up on the idea of tiered storage.

404 Not Found

https://stratis-storage.github.io/faq/

Originally posted by allquixotic View Post

I will continue to use ZFS on Linux (at my own peril? Fine.) until Linux offers an in-kernel alternative that matches its performance, featureset and maturity. LLNL has the right idea -- they knew what they were doing when they invested so many dollars into the development of ZoL. They needed a tool that didn't exist, so they built one.

To be correct is unlikely to be a complete in kernel alternative to ZFS. Its likely to be something like stratis.

XFS lead developer has taken a different route to the tiered storage problem. Btrfs started as clone of the ZFS idea and ZFS started as something after the WAFL idea. All these have the idea that you integrate the block layer stuff into the file system layer.

XFS developer route is more interesting. https://lwn.net/Articles/747633/ it starts here with a simple question. Why do I need a loopback device to mount a file system image in a file? the answer is it does not. One of the alterations of XFS was the means to pass though the blocks of a file to a file system driver without using a loopback. So over time all Linux file system drivers could be updated and render loopback useless.

You have the enospace handling chnage. Something that is being worked on the means to see block level check-sums from general file system operations. Yes software raids do put checksums all all the blocks so do hardware raid controllers.

There is a trend here.

Why are so called tiered storage file systems reinventing the wheel? And is this right?

Yes the wheel so called tiered storage file systems are all reinventing is the block layer. Maybe just Maybe we should improve the block layer API/ABI to file system so that all file systems Linux supports with min code can come tiered storage file systems.

Originally posted by k1e0x View Post

Well.. yeah.. ? FreeBSD is a server OS.. so is Linux really (at least it use to be..) And ZFS is an enterprise filesystem.. It has some cool uses in the home but ZFS runs 100 petabyte arrays, ZFS dosen't compeate with Ext4.. it competes with NetApp's WAFL filesystem. People don't really want it in Linux tree as ryao has said. Most people are pretty happy with where it is.. they just want it to work well with Linux so they can use it to free their enterprise infrastructure from the likes of EMC, NetApp and DDN.

Problem ZFS and WAFL in performance does not compete well with many file systems. Also no single file system is going to be perfect for every single workload. The Redhat/IBM backed route is interesting. In time we may remember ZFS as this biggest historic design mistake.

Please note saying that ZFS is a enterprise file system does not really mean much thinking that XFS is one of your oldest enterprise file systems. More interesting xfs leadership has had the least change out of any file system.

**k1e0x** · 15 January 2020, 02:17 AM

Originally posted by oiaohm View Post

Problem ZFS and WAFL in performance does not compete well with many file systems. Also no single file system is going to be perfect for every single workload. The Redhat/IBM backed route is interesting. In time we may remember ZFS as this biggest historic design mistake.

Please note saying that ZFS is a enterprise file system does not really mean much thinking that XFS is one of your oldest enterprise file systems. More interesting xfs leadership has had the least change out of any file system.

What are the design mistakes, That it wasn't written by RedHat? lol.

You're such a shill oiaohm. Please stop.

ZFS only changed how people thought about filesystems and moved the entire landscape to copy it. No filesystem created since hasn't taken inspiration from it. And XFS is trying -trying- to bolt on it's feature set. I've seen some of the hoops they are trying to do to make that happen and it's terrifying.. like loopback volumes and extents. ugh no thanks.

**allquixotic** · 15 January 2020, 02:31 AM

Originally posted by oiaohm View Post

Now you are in trouble with logic. RHEL dropped Btrfs but did not give up on the idea of tiered storage.

I read about Stratis before. I actually like what Red Hat is doing. Unfortunately, Stratis right now is about as mature as ZFS was in 2002 or so. It has some years to go before it's ready for prime time. For those not eager to wait 3+ years, ZFS is the only option.

If Stratis had been started, I don't know, in 2007, and already had 13 years of varied workload testing under its belt, ZFS on Linux might not even exist, because nobody would have bothered to port it. They'd be using Stratis with XFS.

Now you might be saying "But wait, Stratis is layered on top of existing technologies, so it won't take 10 years to be ready." Maybe so -- I've seen successful "lightweight" projects that build on a combination of userspace and kernel primitives that hit the enterprise in a few years; Podman and LXD come to mind. But for people who use Ubuntu LTS or RHEL/CentOS releases only (for their stability and long support lifecycle), realistically those releases are not going to retroactively add in something like Stratis after release... probably. Or if they do, it will require a proprietary paid subscription until the next major release of the OS.

Looks like RHEL 8 has Stratis as a "tech preview" -- that's all well and good, but tech previews generally would not get the Authority To Operate in a production environment. I might consider doing that on my home servers, but... it'd be like using LXD in Ubuntu 16.04. Rough around the edges. Plenty of gotchas and missing features. ZFS has none of these problems.

Originally posted by oiaohm View Post

XFS lead developer has taken a different route to the tiered storage problem. Btrfs started as clone of the ZFS idea and ZFS started as something after the WAFL idea. All these have the idea that you integrate the block layer stuff into the file system layer.

I am aware of the technical reasons why it has been argued to keep the block layer and FS layer separate, and some of them are pretty compelling. But from what I've seen, this layering also creates performance problems with certain types of behaviors, and makes certain features extremely difficult to implement efficiently. If they can figure out how to keep these layers decoupled, or just very loosely coupled at best, while maintaining the same level of performance as a filesystem that tightly integrates block layer and FS layer concepts, great. But that's a development task, meaning it's probably months or years away from hitting Linux git master, and another couple years later until it's debugged, tested, and ready for enterprise distros to build on.

Originally posted by oiaohm View Post

Yes the wheel so called tiered storage file systems are all reinventing is the block layer. Maybe just Maybe we should improve the block layer API/ABI to file system so that all file systems Linux supports with min code can come tiered storage file systems.

I think this is kind of the "Storage Spaces" idea from Windows. The closest equivalent Linux has today is LVM2, but that has a lot of problems, not the least of which is performance. Storage Spaces on Windows isn't a great implementation, either. It seems that whenever someone tries to implement a block layer that can handle tiered storage on its own, there are major limitations and performance gotchas, but whenever a filesystem tightly integrates the block and FS layers, it turns out great. Compare LVM2/Storage Spaces with ZFS/APFS. Which one is faster, least buggy, and more relied upon?

The other problem is, to get this totally right and really nail down this problem, the block layer solution needs to be absolutely immaculate -- as in, perfect. It needs to be so good that any conceivable filesystem concept can use it. The only way the current block layer can be said to satisfy that criterion is because it's so feature-deprived that it literally implements the lowest common denominator, and is extremely simplistic in design. But it then puts a huge burden on filesystems to build on top of it for any non-trivial features, like tiered storage.

Originally posted by oiaohm View Post

Problem ZFS and WAFL in performance does not compete well with many file systems. Also no single file system is going to be perfect for every single workload.

ZFS is fast enough for the people who use it. In fact, being able to take advantage of tiered storage probably results in a faster overall experience compared to having to directly write to the HDDs. Using ARC, L2ARC and ZIL when you have this kind of hardware (big HDDs + fast/small SSDs) will probably get you the highest total system performance (real-world, not microbenchmarked) for that given hardware. Obviously, for a single NVMe SSD in a laptop, a filesystem that doesn't do checksums like ext4, or one built for flash devices from the ground up like f2fs, will probably be faster.

Originally posted by oiaohm View Post

Please note saying that ZFS is a enterprise file system does not really mean much thinking that XFS is one of your oldest enterprise file systems. More interesting xfs leadership has had the least change out of any file system.

Calling something enterprise doesn't make it stable. Having it run successfully in mission-critical applications on expensive hardware for many years without falling over, is what makes a filesystem earn the name "enterprise". I do believe XFS has also earned the moniker "enterprise" just like ZFS has, and that's totally fair. But XFS also does not compete with ZFS on features -- at least not by itself. Maybe in a few years, once Stratis is out of tech preview in RHEL 9 (??? maybe?) it will be a viable replacement for ZFS.

Can you see the problem here? People have been needing to launch tiered storage solutions into production for years. Some people have said they've been using ZFS on Linux in production for almost 10 years now. And we're looking at maybe-possibly-soon (1 to 3 years) Linux coming up with some kind of an answer to tiered storage just now, in the early 2020s. Sure, hindsight is 20/20, and going forward with a green field project today the answer might be easier, but the need for tiered storage went unanswered by the mainline Linux developers and OS distros for many, many, many years, and ZoL filled the gap.

ZoL will continue to fill that gap, until the Linux community at large settles on a proper solution.

**k1e0x** · 15 January 2020, 03:07 AM

Stratus entire introduction was about how sad RedHat was it can't use ZFS due to licencing. (something that isn't true as Canonical has shown) and how btrfs sucks too much to develop. So they looped a bunch of technologies together in userspace and glued them together with dbus. I don't have a lot of faith for this. It has a lot of moving parts and they have to herd technologies heading in different directions. Gluster on ZVOLS seems far better.. and it works today.

ZFS's development isn't on making it work.. it's on making it work better, faster and leveraging it in new ways.

I wonder if RedHat even did a legal review on this outside the FSF lawyers? Or maybe they are one and the same? lol

And the argument that the second largest Linux vendor "isn't big enough" for Oracle to sue doesn't hold water either from a company that will threaten to take you to court over a $5 Virtual Box license.

**oiaohm** · 15 January 2020, 03:42 AM

Originally posted by k1e0x View Post

And XFS is trying -trying- to bolt on it's feature set. I've seen some of the hoops they are trying to do to make that happen and it's terrifying.. like loopback volumes and extents. ugh no thanks.

There is a problem here what if XFS is not in fact bolting on new features but implementing old lost features xfs. This will come clear latter.

Originally posted by allquixotic View Post

Calling something enterprise doesn't make it stable. Having it run successfully in mission-critical applications on expensive hardware for many years without falling over, is what makes a filesystem earn the name "enterprise". I do believe XFS has also earned the moniker "enterprise" just like ZFS has, and that's totally fair. But XFS also does not compete with ZFS on features -- at least not by itself.

XFS was the first todo tiered storage but not by it self. IRIX xvm allowed to teired storage with XFS with ram drives we are talking before ZFS and WAFL was even a idea. When XFS was ported to Linux on top of LVM2 these features were lost.

In features XVM and XFS was a lot closer to ZFS than Stratis today. So a lot of what is happening to XFS today is implementing what the XVM/XFS combination had but now with a lvm2/xfs combination.

Originally posted by allquixotic View Post

The only way the current block layer can be said to satisfy that criterion is because it's so feature-deprived that it literally implements the lowest common denominator, and is extremely simplistic in design. But it then puts a huge burden on filesystems to build on top of it for any non-trivial features, like tiered storage.

This is absolutely correct. Lot of the block storage layers have been absolute crud.

Originally posted by allquixotic View Post

ZoL will continue to fill that gap, until the Linux community at large settles on a proper solution.

The reality here is sun put hell lot of marketing into ZFS the result was btrfs cloning ZFS way with no one really looking into what had been done before. By filling the gap people also did not have to look.

With ZoL license issues maybe the long term plan should be ZoL extermination.

Originally posted by k1e0x View Post

Stratus entire introduction was about how sad RedHat was it can't use ZFS due to licencing. (something that isn't true as Canonical has shown)

This is not a good point Canonical is in fact registered in the Isle of Man and Redhat is registered in the USA this makes quite a huge difference. Canonical goes ahead and disregards how USA courts might read CDDL and GPLv2 this may or may not come back and bite you but will not come back and bite Canonical. Linux Distributions are not created equal by legal liability.

Originally posted by k1e0x View Post

ZFS's development isn't on making it work.. it's on making it work better, faster and leveraging it in new ways.

This is exactly describing what the XFS lead developer is up-to as well.

Originally posted by k1e0x View Post

I wonder if RedHat even did a legal review on this outside the FSF lawyers? Or maybe they are one and the same?

Yes they did have reviews outside the FSF they had already done a set for MPL 1.1 because back in the day there was a kernel driver licensed under MPL 1.1 yes your CDDL is not different same problems exist in it as MPL 1.1.

Originally posted by k1e0x View Post

And the argument that the second largest Linux vendor "isn't big enough" for Oracle to sue

The odd of Oracle getting any money out of Canonical even if they win is almost zero due to the legal status of the Isle of Man. Correct answer from collectable money pool Canonical is not big enough for Oracle to go after. Redhat on the other hand would be.

Really its about time you stop bring out Canonical as why ZFS is safe. Canonical themselves are safe because of their country of registration. Anyone as like a end user in the USA or Australia if Oracle does sue is basically left holding the bag while Canonical walks away scott free. Lot of ways Canonical don't care about their customer liability only their own.

It would be better to target like Azure or AWS for ubuntu usage than go after Canonical over ZFS.

**k1e0x** · 15 January 2020, 04:16 AM

Originally posted by oiaohm View Post

(red hat guy said stuff)

If old is good why not just improve UFS? It has the simplest block design ever. (they actually do improve it, they added snapshot support to it recently) The reason is this stuff needs to be designed from the ground up. And in filesystem land that takes 10 years minimum. They can't (shouldn't) horseshoe everything else on and expect everything to be ok.

End note there are a lot of good filesystems out there Linux (I mean redhat) could use and improve on if they don't like ZFS. bcachefs and HAMMER2 come to mind.. "But HAMMER is really ingrained into Dragonfly" Yes, it's is. So was ZFS in Solaris. They still ported it. This is hard work and RedHat always seems to take the easiest approach. Like them "fixing" the problem with the kernel OOM killer hanging the system by using a userland systemd daemon to make sure the kernel OOM killer is never called.. good job fixing that kernel Redhat! lol

I got to ask... do you really think you're going to end up with a good OS like this? This is why people are using FreeBSD.. because yes.. they implement things slower but they take their time and make sure it's engineered right. It's intentionally designed and changes are heavily debated, Linux randomly evolves with whatever is popular at the moment and whoever gets traction first.

**oiaohm** · 15 January 2020, 08:33 AM

I am not a redhat guy you are just not liking my answers.

Originally posted by k1e0x View Post

If old is good why not just improve UFS? It has the simplest block design ever. (they actually do improve it, they added snapshot support to it recently) The reason is this stuff needs to be designed from the ground up. And in filesystem land that takes 10 years minimum. They can't (shouldn't) horseshoe everything else on and expect everything to be ok.

Really a lot of what XFS is doing is really not horseshoeing more on top of the file system. Its providing functions to get access to stuff hidden behind the file system. Like the blocks that a file is made up of for other usages other than direct io and so on.

Yes one of my questions is if XFS integration with the block layer behind it can be improved alot and make it more functional on tiered storage items like UFS most likely can be as well. UFS under LInux has had not had the backing from IBM providing servers and other things to test the file system to it limit.

Originally posted by k1e0x View Post

End note there are a lot of good filesystems out there Linux (I mean redhat) could use and improve on if they don't like ZFS.

The licensing of ZFS that fairly keep it out of the mainline kernel tree no matter who review of the license you read leave ZFS screwed for mainline Linux while it remains CDDL.

Please note stratis was not designed to be restricted to xfs only if a more suitable file system gets into mainline Linux. A suitablke file system has to be under a Linux kernel compatible license.

Originally posted by k1e0x View Post

bcachefs and HAMMER2 come to mind.. "But HAMMER is really ingrained into Dragonfly" Yes, it's is. So was ZFS in Solaris. They still ported it.

bachefs hopefully this year. The peer reviews of bcachefs to get into Linux kernel mainline has found many possible data eating errors will be fixed before merged. One thing to come out of btrfs mess was better general file system testing tools.

Some of ZFS issues with Linux is the fact its expecting the Solaris block layer that Linux does not really have. Hammer may be a very bad fit. Its one of the things the porting ZFS out of Solaris has caused some of ZFS performance problems.

Originally posted by k1e0x View Post

This is hard work and RedHat always seems to take the easiest approach. Like them "fixing" the problem with the kernel OOM killer hanging the system by using a userland systemd daemon to make sure the kernel OOM killer is never called.. good job fixing that kernel Redhat! lol

This problem comes about because of one particular difference between freebsd and Linux.

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Way more aggressive overcommit on Linux. One advantage of moving the OOM to userspace is the means to change it on the fly without rebuilding the kernel.

Originally posted by k1e0x View Post

I got to ask... do you really think you're going to end up with a good OS like this?

For this problem to allow means to use something Freebsd cannot do that gives Linux major advantages in particular workloads.

Originally posted by k1e0x View Post

This is why people are using FreeBSD.. because yes.. they implement things slower but they take their time and make sure it's engineered right. It's intentionally designed and changes are heavily debated, Linux randomly evolves with whatever is popular at the moment and whoever gets traction first.

This is also why FreeBSD lost the super computer market. Has also fall behind in the web server market. Never got a foothold in the mobile market.

Sorry trying to make out that FreeBSD has a strong position. People using FreeBSD for workloads are a dieing breed even with the havoc systemd caused.

**LxFx** · 15 January 2020, 08:39 AM

Originally posted by ernstp View Post

I've never seen the point of ZFS when we have Btrfs...

I'm mainly interested in the selfhealing and "RAID" ZFS capabilities for my personal central storage.
If I check the Arch topic for btrfs it says that those features are unstable, contain errors or have significant downsides...
I would prefer the included btrfs before the license incompatible ZFS but one thing I don't want in an FS is it being error prone or unstable....
Anything I'm missing here?

**oiaohm** · 15 January 2020, 09:39 AM

Originally posted by LxFx View Post

I'm mainly interested in the selfhealing and "RAID" ZFS capabilities for my personal central storage.
If I check the Arch topic for btrfs it says that those features are unstable, contain errors or have significant downsides...
I would prefer the included btrfs before the license incompatible ZFS but one thing I don't want in an FS is it being error prone or unstable....
Anything I'm missing here?

You are missing a lot.

ZFS won't save you: fancy filesystem fanatics need to get a clue about bit rot (and RAID-5) - Jody Bruchon

https://www.jodybruchon.com/2017/03/07/zfs-wont-save-you-fancy-filesystem-fanatics-need-to-get-a-clue-about-bit-rot-and-raid-5/

UPDATE 3 (2020-01-01): I wrote this to someone on Reddit in a discussion about the ZFS/XFS/RAID-5 issue, and it does a good job of explaining why this article exists and why it’s presented in an argumentative tone. Please read it before you read the article below. Thanks, and have a wonderful 2020! There really is […]

Lets cover some facts. Your basic harddrives and SSD at the controller level are in fact self healing. Horrible point is our block layers in operating systems have not allow us to simply access the controller generated ECC data. Adding ZFS to operating system does not address this weakness in the block layer. Instead you end up calculating checksum basically twice. Horrible reality here OS block layers need a major rework to give access to information that does exist.

Next btrfs own built in raid is marked error prone but that is not the only option.

LVM on software RAID - ArchWiki

https://wiki.archlinux.org/index.php/Software_RAID_and_LVM

You still have your general operating system raid options and other options.

ZFS not being mainline kernel support does in a lot of ways increase your risk of errors coming from the fact upstream kernel fixes something and does not consider how your ZFS file system driver will be doing things.

This is my problem with ZFS or nothing is normally that they are not really considering the full problem at hand and if ZFS is really fixing the problem or just adding duplication of functionality that in fact increases risk of data loss.

**ryao** · 15 January 2020, 10:37 AM

Originally posted by oiaohm View Post

You are missing a lot.
https://www.jodybruchon.com/2017/03/...ot-and-raid-5/

Lets cover some facts. Your basic harddrives and SSD at the controller level are in fact self healing. Horrible point is our block layers in operating systems have not allow us to simply access the controller generated ECC data. Adding ZFS to operating system does not address this weakness in the block layer. Instead you end up calculating checksum basically twice. Horrible reality here OS block layers need a major rework to give access to information that does exist.

Next btrfs own built in raid is marked error prone but that is not the only option.
https://wiki.archlinux.org/index.php...e_RAID_and_LVM
You still have your general operating system raid options and other options.

ZFS not being mainline kernel support does in a lot of ways increase your risk of errors coming from the fact upstream kernel fixes something and does not consider how your ZFS file system driver will be doing things.

This is my problem with ZFS or nothing is normally that they are not really considering the full problem at hand and if ZFS is really fixing the problem or just adding duplication of functionality that in fact increases risk of data loss.

That article is a bunch of nonsense. ZFS does not use CRCs. There are plenty of other things wrong there too, but how to ensure data integrity is the crux of things, so let’s focus on that.

The presence of some sort of calculation at the controller is useless for ensuring integrity when a valid calculation can be sent with the wrong data. The only way to handle this is to calculate a checksum as early in the stack as possible on write, store it separately and verify it in the same place on read. Expecting the controller to do something for you is letting things happen too late.

Whatever the controller calculates is also not what is stored on disk, or sent back with the data. It gets recalculated each time in a different way. This is a great way to get served the wrong data with a valid checksum/ECC calculation. The proper way to address it is to have checksums stored with pointers that are verified by the kernel.

It is like getting change back when making a purchase. You can rely on the other guy to count it, or you could count it yourself to be sure that you received what you were supposed to receive. The other guy counting it never means that your count of it is redundant. That would just be blind trust that is prone to abuse.

I also wrote a list of issues that hardware RAID controllers have and most of them apply to software RAID:

http://open-zfs.org/wiki/Hardware#Ha...ID_controllers

The claim that a second drive failure during a RAID 5 rebuild is statistically unlikely has been thoroughly debunked by people who did actual statistics:

Triple-Parity RAID and Beyond - ACM Queue

https://queue.acm.org/detail.cfm?id=1670144

The risk of integrity issues with ZFS is lower than with in-tree filesystems, not higher.

Announcement

Linus Torvalds Doesn't Recommend Using ZFS On Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment