Announcement

**ryao** · 18 April 2016, 06:36 PM

Originally posted by SystemCrasher View Post

Sure, because it missing in Windows you seems to be using most of time. Er, wait, someone has actualy wrote driver, though it is alpha quality obviously.

The idea of one filesystem replacing all others in a category (e.g. local block device based storage) is a pipe dream. There are design tradeoffs that make some better for purposes and others worse for other purposes. Those strengths might appeal to certain groups of people, but unless your new filesystem manages to correct a fatal flaw in existing filesystems without introducing new flaws, not many people are going to be interested in it.

The reason why ZFS has done as well as it has in spite of that is because it had the good fortunate of combining a substantial number of good ideas well before anyone else had tried. Consequently, by virtue of having done things well, new filesystems need to not only do them well, but avoid introducing any severe deficiencies while doing them substantially better. Going back to the design trade-offs remark, that probably is not going to happen.

Your "btrfs is as good as zfs" generalization suggests to me that your use cases are such that differences do not matter to you. Your advocacy of reflinks as a means of doing snapshots would seem to confirm that because reflinks are a really poor implementation of the concept of a snapshotting API. Skip the next inline reply for the one after that for an explanation of why they are a really bad implementation.

Originally posted by SystemCrasher View Post

Dare to point some major feature ZFS has got which is missing in btrfs? Sure, there're some shortcomings, but I'm actually using btrfs in like a dozen and half places already and it basically worksforme. I can imagine there could be some gotchas left, but it seems it fine for production use, at least with 4.x Linux kernels.

I think Microsoft would say the same about Linux vs Windows, in favor of getting people onto Windows. If you cannot treat an honest answer as take it or leave it information, please just disregard the remainder of this in-line remark. I am writing this for kernel filesystem developers who are interested in making a better driver, not for people who do not write code to try to engage me in some pointless dialogue. Here is a list of things btrfs lacks off the top of my head:

Stochastic testing in userspace. The core ZFS is driver is compiled into the libzpool.so library and exercised by a tool called ztest. This can help catch problems that tend to be found in production, such as ENOSPC.
The ability to run the latest code on older kernels, with a buildbot to verify sanity. The lack of backports is a pain for enterprise deployments because fixes after non-trivial refactoring are never backported. This also means that compatibility with new features is missing, such that you cannot take your storage from a newer system to an older one.
A Merkle tree based disk format. This is a double edged sword. It means that you cannot "reshape" easily and can easily respond to memory pressure, but you lose certain properties such as the ability to go back to a last good state after a crash and end up needing something like xfs_repair, which is put into a misnamed tool called fsck. Fixing this would require a new disk format.
Strong checksums. The crc32c checksum has weaknesses that could fail to detect odd bugs in the storage hardware and 32-bits is a bit on the weak side even for a decent checksum algorithm. ZFS uses the fletcher checksum. It was designed to avoid the problems in the CRC family of checksums.
A separate hierarchical namespace for what btrfs calls "subvolumes".
Support for creating and managing block devices just like any other volume. This is probably where the separate namespace matters. It also cannot be done by the loop device due to overhead.
Separate snapshot and clone functionality, rather than the awkward combination of snapshot+clone that btrfs implements.What btrfs has done is similar to the Windows `CreateProcess()` function versus the similar to POSIX fork() + execve() functions. It works when you want both functions, except for the times when you really only want the first of the two logically separate functions. A snapshot should only be able to be renamed, replicated, destroyed, cloned or mounted readonly. There should be no other operation that works on it. This is important for making sure that your snapshots contain what you expect them to contain. The fact that snapshots and clones were showhorned into the mount namespace might explain why these two functions are not separate in btrfs.
A disk format specification for new developers getting started. While ZFS' disk format specification is old, it is still a good starting place for new developers. btrfs does not seem to have anything like that.
A superior page replacement algorithm. ZFS uses ARC while btrfs is still using the LRU algorithm that the VFS provides helpers for using rather than implementing its own. Having metadata required to figure out where to place things on disk and reference them so you can write that out without blocking on a read tends to be more important in a CoW filesystem than an in-place filesystem, where you can just write in-place (and partial writes are the user's problem).
A mechanism to throttle IO operations that increase dirty data in increasingly large amounts before it fills. Failing to throttle userspace until the limit forces you to throttle userspace will either block userland for a long period of time (waiting for everything) or short periods (write out a tiny bit only to hit it again), which leads to unpredicatable performance. The latter might look like it works until you hit an fsync (or an operation like it), where everything stops from userland's perspective. This is a weakness of the generic code for handling dirty data writeout that btrfs uses than of btrfs itself, but the VFS API is generic enough that btrfs could handle this own its own like OpenZFS does. That being to insert `usleep()` into the VFS operations that increase dirty data to slow them down based on how close the system is to the dirty data limit to keep userland from experiencing seemingly random lags.
A way to perform writes without issuing reads without nodatacow on random-write intensive workloads such as databases and virtual machines. Avoiding read-copy-write on CoW operations of extent-backed files is hard, but that might be a reason to adopt to an indirect block tree like what ZFS uses rather than tell users that they are on their own for data integrity. Telling a VM it is on its own for data integrity would not kill data integrity when the guest uses a driver that can handle it, but the same cannot be said for userland applications such as databases.
Parity-based redundancy without read-modify-write like raidz. btrfs raid 5/6 is the MD RAID code copied into the filesystem and uses a stripe cache to try to get good performance. This might seem more acceptable given that btrfs' extents ensure that userland applications are likely to suffer from read-modify-write overhead no matter what you do, but duplicating the problem in another layer is just going to make fixing it that much more difficult.
N-way mirroring. btrfs does a weird thing where it stores two copies of data on separate disks and calls it a mirror while ZFS will support arbitrarily number of disks with each disk storing the same data at the same location as you would expect on a mirror.
Graceful handling of disk failures. Things like needing to pass a degraded mountflag (especially if this is your rootfs) should a disk not be working is definitely not graceful handling. (Ab)using mount to act as import/assembly probably made this seem like acceptable behavior, but it is a violation of postel's law. If you want a way to warn the system administrator that there is a problem, you should have userspace handle notification.

There are several design decisions that I mention there that would be a pain to address without breaking backward compatibility, although it probably can be done. For instance, you can have a feature flag to indicate indirect blocks can be used for data and then have the on-disk "inodes" indicate when they are using an indirect block addressing scheme. Things like snapshots vs clones and a separate namespace would be a bigger pain to implement, but it is probably doable too.

Originally posted by SystemCrasher View Post

When it comes to productivity, cp --reflink inceases my productivity by like an order of magnitude in some cases. I can bring up new instance of container or VM disk or experimental disk image in virtually no time, initially sharing blocks on disk with its predecessor. Then CoW would unshare changed blocks as needed, storing differences somewhere else. It makes VMs/containers/disk images activities wildly efficient, allowing one to instantiate like a dozen of distinct versions of fairly large files/hierarchies without taking time to actually copy it, without CPU costs to dedup it, and without disk space needed to store dozen of independent copies in "usual" way. Perfect version of overcommit and dedup, where you hint CoW explicitly. Somehow, ZFS does not implements it to the date.

People have discussed implementing reflinks, but they have some downsides:

Initial implementation discussions have suggested that doing it needs an indirection, which is not nice. Maybe btrfs can get away without it, but I am not sure.
If you use the immutable bit to simulate a real snapshot, you have a racy situation where something else can modify it.
Rolling back is a hack. Rolling back requires doing two reflinks so that you can unlink the second one and do a new reflink if you want to rollback. This is a maintenance nightmare because it is up the the sysadmin to figure out what is what.
You cannot do send/recv.

Originally posted by SystemCrasher View Post

Negative. I can't imagine other sane reason to select some really exotic license. Which is also copyleft-style, yet incompatible with other copyleft style. Of course it could be extreme stupidity but I doubt company could achieve things Sun did by being terminlly stupid to THIS degree. But whatever, this company no longer exists and Oracle proven to be quite a troublesome entity who has caused ZFS fragmentation. Now there is Oracle's ZFS and community's ZFS and these aren't same. Seems it would be like UFS where every *nix around thrown own improvements, without giving slightest fuck what others do. The result is a half dozen of semi-(in)compatible UFS versions floating around.

Unless Oracle makes its own Oracle ZFS driver for Linux, Oracle's ZFS fork does not matter at all on Linux.

As for licensing, I am not a lawyer, but the CDDL is a "F/OSS" license according to the FSF. The FSF's publication seems to be almost in exact agreement with the SFLC's opinion, with the only difference being what constitutes an exception that allows Linux LKMs to be under otherwise incompatible licenses:

https://www.softwarefreedom.org/news...x_Kernel_CDDL/

The FSF appears to want one in writing while the SFLC appears to think that the kernel developers' actions created one. While a written exception would be nicer to have, there are plenty of situations in law where people's actions gave others' rights. An example of this is an implied easement in property law.

Under the assumption that everything that the FSF and SFLC claimed in common is correct, I am inclined to agree with the SFLC on the single point on which they differ. It is difficult to claim that there was no exception made when the mainline kernel developers made an interface for non-GPL software to use, the project lead (Linus Torvalds) claim that ports of non-GPL drivers such as the Andrew filesystem is not a violation and the idea went unchallenged for years. By this point, it is common practice. Ubuntu also was not the first distribution to ship binary ZFS kernel modules either. Gentoo and Sabayon did it on ISOs years before Ubuntu did.

Furthermore, there has yet to be a single person who thinks there is a violation that has claimed that a port of ZFS to Linux would not qualify under fair use, while there is a legal opinion that it does:

http://www.rtt-law.com/public/files/...te%20paper.pdf

As per the Berne convention, this is a matter of US law, so unless people who hold a majority of the Linux copyright simultaneously thinks that there is a violation and that there is no fair use defense under US law, there is nothing to discuss. Without a majority, you cannot dispel the idea that the majority implicitly allowed it and without a way to counter fair use arguments, even a majority cannot claim that the law gives them the right to stop people from distributing binary ZFSOnLinux kernel modules.

Lastly, what I have said is just the understanding of a non-lawyer who tried to understand what actual lawyers wrote. If you hold copyright on a part of the code in question and want to talk to someone about this matter, I suggest getting in touch with the SFLC to speak with actual lawyers. If you are not a copyright holder, this does not concern you under the law. If you think otherwise, you can check with an attorney.

**unixfan2001** · 19 April 2016, 01:22 AM

Originally posted by SystemCrasher View Post

Sure, because it missing in Windows you seems to be using most of time. Er, wait, someone has actualy wrote driver, though it is alpha quality obviously.

I only use Windows for Windows development. Since I'm a cross platform developer, that means I'm using each system equally. Given that my home computer runs Ubuntu 16.04 on its primary disk, GNU/Linux gets more than equal time.

Dare to point some major feature ZFS has got which is missing in btrfs? Sure, there're some shortcomings, but I'm actually using btrfs in like a dozen and half places already and it basically worksforme. I can imagine there could be some gotchas left, but it seems it fine for production use, at least with 4.x Linux kernels.

For one, it doesn't do triple-parity. It's also far less mature than ZFS.
When handling terabytes or even petabytes of data, the last you want is to lose your records to an immature filesystem.

It is a great benchmark of resource consumption, smartphones are relatively modest hardware, due to thermal & battery life concerns.

When running a datacenter, modest increases in the resource requirements for a filesystem are the least of your worries.

Negative. I can't imagine other sane reason to select some really exotic license.

Then, clearly, you aren't thinking hard enough.

The CDDL is file-based, the GPL is work-based.
Being file-based has one obvious benefit: It actually clears up linking policy issues.

GPL2 also doesn't include an explicit patent grant, similar to CDDL's patent peace provision.

Lastly, the CDDL was carefully crafted to adhere to the Digital Millenium Copyright Act and to European Union copyright law. GPL wasn't.

You seem to think (quite erroneously) that the CDDL is some sort of niche license, when it's in fact one of the preferred licenses of the OSI.

**SystemCrasher** · 20 April 2016, 10:30 AM

Originally posted by ryao View Post

The idea of one filesystem replacing all others in a category (e.g. local block device based storage) is a pipe dream.

I'm pretty sure one size does not fits all. I use wildly different filesystems. Hopefully local ZFS zealots would also get idea.

new filesystem manages to correct a fatal flaw in existing filesystems without introducing new flaws, not many people are going to be interested in it.

I'm not a big fan of ZFS assumptions. IMHO "should never happen" isn't a good answer. I could be running laptop with single drive, non-ECC RAM, it could face bad sectors, CPU could overheat and I wouldn't count on bugged ACPI. But I may want my data back. So it is nice if filesystem could live with these less convenient assumptions.

The reason why ZFS has done as well as it has in spite of that is because it had the good fortunate of combining a substantial number of good ideas well before anyone else had tried[...]

Yes. But to put it directly, ZFS design appears to be heavily biased to huge enterprise storages on expensive hardware, where admins are ok with inconvenient management demands if it is big deal.

Your "btrfs is as good as zfs" generalization suggests to me that your use cases are such that differences do not matter to you.

Yes, something like this. Say, I think golden age of huge centralized storages on expensive enterprise hardware is over. Humans invented other ways to store large amounts of data cheaper, easier and better. Ironically, distributed systems do not put much demands on underlying FS. What I see like btrfs use cases? If simple FSes like ext4 aren't enough, yet going distributed is an overkill, btrfs would do. It means medium-scaled storages or advanced uses, like mentioned cp --reflink trick. ZFS... what YOU see as its primary use cases today?

Your advocacy of reflinks as a means of doing snapshots would seem to confirm that because reflinks are a really poor implementation of the concept of a snapshotting API.

I do not consider reflinks like "snapshots". When I refer to "snapshot", I expect some consistent "saved state" (e.g. hierarchy and associated blocks). Like, say, state of fleet of VMs at some point of time. From user's perspective, reflinks are resource trick, "deferred copy" or "hinted dedup". It allows very rapid and space efficient deployment of VM/container fleets and great for testbeds, setting up "working copy" in "no time", taking "no management".

Skip the next inline reply for the one after that for an explanation of why they are a really bad implementation.

OTOH from practical standpoint it makes significant part of my activities very efficient while "management" is dead simple.

I think Microsoft would say the same about Linux vs Windows, in favor of getting people onto Windows.

I wish 'em luck, but I'll have inconvenient questions and remarks ready :P. I've had chance to see how they work internally. So I like Linux. It takes comparison to get idea. Say, I believe Internet of Things is a major part of future. I'm ready for it, ARMed to the teeth with affordable yet featured boards, openminded communities and highly competitive vendors allow me to be efficient in what I'm doing. MS can't offer me better techs or tools, better customization or something. I do care about MY efficiency, not MS success - GL with marketing BS. Proprietary ecosystems are slow, boring and uncooperative, real tarpit for ideas.

If you cannot treat an honest answer as take it or leave it information, please just disregard the remainder of this in-line remark. I am writing this for kernel filesystem developers who are interested in making a better driver, not for people who do not write code to try to engage me in some pointless dialogue. Here is a list of things btrfs lacks off the top of my head:

At least I'm up to learn more about kernel internals, techs I use, and honest evaluation of techs never hurts. Different point of view is also good.

[*]Stochastic testing in userspace. The core ZFS is driver is compiled into the libzpool.so library and exercised by a tool called ztest. This can help catch problems that tend to be found in production, such as ENOSPC.

If you do not mind dialogues and honest answers: I have quite some expertise in SW quality. My experience suggests best way is to give software to thousands of users. Automated tests? Fancy, but usually takes quite some efforts, yet yield is modest. So it is good to have if you could afford it, but if it is me who is going to be responsible for something, I would prefer 1000 real-world users hammering software under real-world conditions over 1000 automated tests. Users are better in breaking things and catching bugs which are really harmful. Btrfs got some advantages in this regard: it is part of mainline kernel, people hammering it during -RC series. Btw, kernel devs are noteworthy on their own: kernel usually usable even in pre-RC state, I can't readily name other programs exposing this level of quality during early development. Furthermore, users running -RCs inevitably test integration of all subsystems, while ZFS development processes are separated from kernel so it isn't a case.

I should admit I sometimes see XFS testsuite catching bugs, but major bug squashing occured after wide deployments took place, including infamous facebook servers. On ZFS side, it has got older code base. So it received better testing for sure. I would expect higher bug rates for btrfs due to newer code. Though from practical standpoint, bug flow is calming down and I'll be seriously worried only about RAID5/6 code paths. That's what we'll have if I'll try to be honest and apply my expertise in area as far as I can get.

[*]The ability to run the latest code on older kernels, with a buildbot to verify sanity. The lack of backports is a pain for enterprise deployments because fixes after non-trivial refactoring are never backported. This also means that compatibility with new features is missing, such that you cannot take your storage from a newer system to an older one.[..]

On other hand, what about testing integration? Buildbot only checks very basic things. Running such configurations looks like good way to ensure all bugs are yours. Epic win for QAs, bad idea for others. Furthermore, kernel faces plenty of other fixes and I'm not sure why these are of lower priority vs ZFS fixes, could be equally badass bugs. IMHO enterprises should be somewhat crazy to fiddle with such setups. Not to mention it likely to be unsupported configuration for most distro vendors. Sorry if this does not sounds great, but if we are up for honest answers...

[*]A Merkle tree based disk format. This is a double edged sword. It means that you cannot "reshape" easily and can easily respond to memory pressure, but you lose certain properties such as the ability to go back to a last good state after a crash and end up needing something like xfs_repair, which is put into a misnamed tool called fsck. Fixing this would require a new disk format.

This is probably one of strongest differences between designs, causing different tradeoffs.
Yet, btrfs got 3 tools.
- Btrfs.fsck: NOOP, keeps booting happy. Driver does rollback itself. Since data and metadata are CoWed, rollback ditches recent writes, going back to past to last consistent point. Full seq more complicated due to fsync journal tree. But WTF it "can't go back"? It actually does. As "normal" crash recovery action. IIRC it could also try various RAID copies, so while ZFS probably a bit more die-hard due to merkle, btrfs isn't as bad as you describe. It is also valid to claim it "does not needs fsck", for same reason ZFS claims it.
- There is btrfsck. Offline tool to parse, check and fix metadata. Never called during boot. Last-resort manual repair tool. When filesystem badly damaged beyond design assumptions, we need thoroughly walk and crosscheck metadata, fixing missing/bogus parts if possible. Goal is to make damaged filesystem mountable in cases damage is strong enough to cause problems mounting it by usual means. It needed if emergency exceeded design assumptions. IIRC ZFS lacks tools for such situation, devs prefer pedal it "should never happen". I dislike this approach: does not fits at least some of my use cases.
- If one is desperate to fix filesystem, there is "btrfs restore", it would not cause extra damage, nor it wold fix anything: it is a data recovery tool. One could try various "generations", going to various points in past. Eventually one is likely to stumble on relatively undamaged hierarchy, being able to read valuable data back. Really great data recovery option others are missing. Especially epic as part of stock toolkit.

Result? Data recovery from badly damaged non-mountable ZFS is hard and costly due to lack of tooling. Btrfs attempts to be better in this regard. I think I can handle most issues myself, unless there is need to fix HDD mechanics or firmware and no redundancy in effect.

[*]Strong checksums. The crc32c checksum has weaknesses that could fail to detect odd bugs in the storage hardware and 32-bits is a bit on the weak side even for a decent checksum algorithm. ZFS uses the fletcher checksum. It was designed to avoid the problems in the CRC family of checksums.

That's where I'm agree. Not really sure why btrfs devs were so inclined on crc32.

[*]A separate hierarchical namespace for what btrfs calls "subvolumes".

One could use btrfs in similar mode. Ubuntu even installs like this by default, assembling "/" posix hierarchy of several subvolumes, topmost FS level being invisible to POSIX, but mountable, if desired. I.e. it is a "separate hierarchical namespace". So, where ZFS advantage happens?

[*]Support for creating and managing block devices just like any other volume. This is probably where the separate namespace matters. It also cannot be done by the loop device due to overhead.

Ok. I'm not creative enough to understand why it is bad for me, but for those who do, it is disadvantage, sure.

[*]Separate snapshot and clone functionality, rather than the awkward combination of snapshot+clone that btrfs implements.

Does not feels awkward in terms of management, maps reasonably to my mindset and resembles other system level processes.

What btrfs has done is similar to the Windows `CreateProcess()` function versus the similar to POSIX fork() + execve() functions.

fork() is not a real thing in Linux. There is clone() and unshare() and I really prefer it over mere fork() who is a clone() with certain set of flags. Everything is LWP, "thread" shares almost everything with parent. "Process" shares less. "Containers" unshare even further. Btw, memory "unsharing" for "processes" in Linux quite similar to cp --reflink idea.

It works when you want both functions, except for the times when you really only want the first of the two logically separate functions.

I use snapshots when I want to keep consistent state of some fairly large set of entities as some point in time. I.e. if management cost is lower than re-creation cost. I use cp --reflink if I want this intelligent "unsharing" for rapid deployment and/or resources efficiency, while I may or may not care about "consistent state at some point of time". Overall, it maps reasonably to typical operations with VMs or disk images, as well as usable to keep several system states, etc.

A snapshot should only be able to be renamed, replicated, destroyed, cloned or mounted readonly. There should be no other operation that works on it. This is important for making sure that your snapshots contain what you expect them to contain.

This is somewhat true, but sometimes you have snapshotted, say, a large fleet of VMs, and figured out one VM is in wrong state, jeopardizing whole point. Redoing everything from scratch could be time consuming, so carefully booting single VM on writeable snapshot and getting it right could be better idea. I would agree it takes care to keep things consistent. Btrfs got readonly snapshots if someone thinks it matters.

The fact that snapshots and clones were showhorned into the mount namespace might explain why these two functions are not separate in btrfs.

Then, the next question is: do you also dislike runtime memory unsharing of processes in Linux which is similar to cp --reflink behavior in spirit?

[*]A disk format specification for new developers getting started. While ZFS' disk format specification is old, it is still a good starting place for new developers. btrfs does not seem to have anything like that.

Some structures and overall ideas are documented in wiki, but it is imperfect/outdated and could be improved, so I'm somewhat agree.

[*]A superior page replacement algorithm. ZFS uses ARC while btrfs is still using the LRU algorithm that the VFS provides helpers for using rather than implementing its own.

Any estimates ARC gains? Then, VFS means integration with rest of kernel memory management and allows e.g. evicting cache when some process needs memory. I do not need fucking crashey system prefferring to terminate apps to keep FS happy! It maybe ok for fileservs, but world does not revolves around them unlike Sun insited. Screw that.

[*]A mechanism to throttle IO operations that increase dirty data in increasingly large amounts before it fills.[...]

There're countless ways to hurt IO performance of virtually any filesystem, not sure why this particular one valued over anything else. Is it somehow special?

The latter might look like it works until you hit an fsync (or an operation like it), where everything stops from userland's perspective.

IIRC btrfs did some tricks to make fsyncs more efficient. Is it still a pressing issue for real-world workloads, etc? If someone thinks ZFS does it better, it should be an issue in real world, right?

keep userland from experiencing seemingly random lags.

One corner case ... ok, its bad, but why it more important than others? There're many ways to provoke lags or degrade performance for virtually any configuration.

[*]A way to perform writes without issuing reads without nodatacow on random-write intensive workloads such as databases and virtual machines. Avoiding read-copy-write on CoW operations of extent-backed files is hard, but that might be a reason to adopt to an indirect block tree like what ZFS uses rather than tell users that they are on their own for data integrity.

Speaking for myself, when there is intense random write like VM or DB, my primary concerns are:
- number of fragments which would occur over time, downgrading linear access to random - kills backups speed on mechanic drives.
- amounts of metadata requred to keep track of these countless fragments. In worst case metadata could grow large, degrading performance.
- double-fragmentation by fragmenting CoW drive/DB on its own and then even further fragmenting underlying storage by another CoW.

NODATACOW deals with all these concerns in one shot. No bunch of fragments -> no issues.

Telling a VM it is on its own for data integrity would not kill data integrity when the guest uses a driver that can handle it, but the same cannot be said for userland applications such as databases.

DBs are supposed to run on various filesystems, that's why they do everything themselves rather than relying on particular FS behavior. If it isn't a case, their users are going to have tough luck somewhere else. But ok, let consider NODATACOW implies NODATASUM strange/questionable tradeoff.

[*]Parity-based redundancy without read-modify-write like raidz. btrfs raid 5/6 is the MD RAID code copied into the filesystem and uses a stripe cache to try to get good performance. This might seem more acceptable given that btrfs' extents ensure that userland applications are likely to suffer from read-modify-write overhead no matter what you do, but duplicating the problem in another layer is just going to make fixing it that much more difficult.

On other hand, btrfs design assumes per-object RAID levels. Nice headroom for future improvements, isn't it? As far as I understand ZFS wasn't meant to be like this and got plenty assumptions which would prevent this feature.

[*]N-way mirroring. btrfs does a weird thing where it stores two copies of data on separate disks and calls it a mirror while ZFS will support arbitrarily number of disks with each disk storing the same data at the same location as you would expect on a mirror.

I do not see anything wrong with calling 2 copies a mirror, since each block gets 2nd copy on other device. Just a bit more flexible definition. Furthermore, idea that same data should be on same offsets of drives is a really narrow-minded and puts really bad constraints on drives selection and pool management. It could be better than that, btrfs is about to get it right.

If you want a way to warn the system administrator that there is a problem, you should have userspace handle notification.

It isn't something terribly hard to fix, etc. Most likely nobody cares. But probably you're right and it could be improved.

Things like snapshots vs clones and a separate namespace would be a bigger pain to implement, but it is probably doable too.

I do not really get why I should care if something is "clone" or not (if you refer to cp --reflink kind of things). Its like caring if process PID 5439 shares memory pages with PID 7145 or not. At very most, from management point I want to see how much pages it shares, if any. But splitting processes on grounds of memory sharing attitude into separate containers (namespaces) does not looks logical to me. Why it should be different for filesystems?

[*]Initial implementation discussions have suggested that doing it needs an indirection, which is not nice. Maybe btrfs can get away without it, but I am not sure.

Yet, you acknowledge I'm advocating this feature. Because it works great for my use cases. Sure, not everybody would benefit from it. Some ppl are ok with ext4, etc after all.

[*]If you use the immutable bit to simulate a real snapshot, you have a racy situation where something else can modify it.

OTOH if I did almost perfect snapshot but detected minor error, it happens to be a bummer if I have to redo everythnig from scratch.

[*]Rolling back is a hack. Rolling back requires doing two reflinks so that you can unlink the second one and do a new reflink if you want to rollback. This is a maintenance nightmare because it is up the the sysadmin to figure out what is what.

I do not get, why it have to be reflink? Normally snapshot rolled back by moving old subvolume out of the way and mounting another subvolume at same point (snapshot is a subvolume).Then old subvolume could be deleted. If one needs to retain snapshot to roll back to it again, one have to snapshot the snapshot and roll back to one of its copies, keeping another for later use. Somehow this logic is similar to fork(). Is fork() bad?

[*]You cannot do send/recv.

Hmm? What the hell you think "btrfs send" and "btrfs receive" are doing?

Unless Oracle makes its own Oracle ZFS driver for Linux, Oracle's ZFS fork does not matter at all on Linux.

What matters is 2 incompatible on-disk formats. It is undesirable outcome. UFS got like half dozens of flavours. Fortunately these aren't widespread, otherwise it would be a real PITA.

As for licensing, I am not a lawyer, but the CDDL is a "F/OSS" license according to the FSF.

While I generally dislike when lawyers trying to halt technical processes, in case of Linux it just never works and when someone trying tricks like this, it only sparks creation of better solution. Something I welcome. Bitbaker devs tried it. Now, there is git. I think best thing Oracle could do is to allow GPL licensing of their code. But it hardly something typical for Oracle, at least not before it is too late.

**chrisb** · 20 April 2016, 10:38 AM

Originally posted by k1l_ View Post

Funny how you cite the Text and marking the Text around the exact quote telling the exact opposite of what you try to make it look like.

That's because Linus explicitly says that the AFS argument is hard to apply to modern file systems:

"historically there was a much stronger argument for things like AFS... In contrast, these days it would be hard to argue that a new driver or filesystem was developed without any thought of Linux."

The Andrew File System was introduced by researchers at Carnegie-Mellon University (CMU) in the 1980’s. It isn't a modern file system and it predates the creation of Linux (which Linus pointed out).

The question is not whether ZFS was implemented on OpenSolaris first (obviously it was) - but whether the Linux implementation depends on Linux internals enough to be legally considered a derived work of Linux. I'm not familiar with the code to make that judgement either way.

**SystemCrasher** · 20 April 2016, 11:16 AM

Originally posted by unixfan2001 View Post

I only use Windows for Windows development.

Blah, blah, blah. Feel free to enjoy MS best practices, and superb tools. But I'm yet to see superb products made with "superb" tools. Not to mention "superb" tools can't do C99 as of 2016. How the hell one could make their devtools so retarded? I do not get it, nor I understand how ppl are using it.

[qoute]For one, it doesn't do triple-parity. [/quote]
Finally something sensible instead of marketing BS and generic words. Ok, this is good feature.

It's also far less mature than ZFS.

I would agree on "less" but would challenge "far". Well, at the end of day it works in most cases. Maybe it is possible to break it, but I do not know ovious fair ways. Which is quite an achievement for any program, as my nickname suggests I'm good at it.

When handling terabytes or even petabytes of data, the last you want is to lose your records to an immature filesystem.

If someone runs petabytes-scale storage as single, centralized thing, congratulations: they fucked everything up even before they've started, screwing system architecture, lol.

When running a datacenter, modest increases in the resource requirements for a filesystem are the least of your worries.

Sure, go learn Google to handle petabytes of data. These worthless nuts ran some EXT4 without journal, for speed reasons. Can you imagine it? But sorry to inform you, if you're not a complete moron, you do not have to care about journal in petabyte-scale setups. You do not even have to care if particular storage node stays operational at all. So it could completely die, damage data or whatever. If you do it right, it just not a big deal. It is also possible to do integrity checking as inherent part of many other things, like, say "content addressing" (also dubbing as redundancy and load balancing).

You seem to think (quite erroneously) that the CDDL is some sort of niche license, when it's in fact one of the preferred licenses of the OSI.

Realistically speaking, CDDL _is_ a niche license. ZFS is most famous product using it. I would have trouble naming another product using it. So it is niche license.

Then, licensing per file looks halfassy. Licensing work as whole looks logical. Licensing per file ... ok, it looks messy. Linking exceptions? If someone is GPL, the only "exception" treacherous wrenches trying to work around license should get is a middle finger. If someone is refuses to cooperate it would be fair they also have to write all the code, lol. Patent grants are maybe not so bad idea, but I guess sun only did it to protect itself, and it may or may not be relevant to others. At the end of day, if one has got patents and wants to share them they could issue separate patents grant license. If they want to sue someone.... from realistic standpoint, neither GPL nor CDDL would prevent it. So it maybe improvement, but not worth of breaking compatibility with loads of pre-existing softare. So I think Sun intentionally attempted to thwart Linux yet trying to pretend they had sensible reasons. While these reasons are looking quite weak.

**unixfan2001** · 21 April 2016, 02:52 AM

Originally posted by SystemCrasher View Post

Blah, blah, blah. Feel free to enjoy MS best practices, and superb tools. But I'm yet to see superb products made with "superb" tools. Not to mention "superb" tools can't do C99 as of 2016. How the hell one could make their devtools so retarded? I do not get it, nor I understand how ppl are using it.

Can you shut your bloody cake hole and stop sounding like a pretentious cracker for no good reason? This isn't about Microsoft or Windows or anything else.
The fact that you feel the need to bash Microsoft at every branch in the path displays a severe need to grow up.

I would agree on "less" but would challenge "far". Well, at the end of day it works in most cases. Maybe it is possible to break it, but I do not know ovious fair ways. Which is quite an achievement for any program, as my nickname suggests I'm good at it.

Since you don't seem to use it that extensively, I doubt you'd run into many issues.
Just know that there are definitely inherent issues with it. Like the way it stores its meta-data or uninterruptible I/O sleeps stalling the system.

If someone runs petabytes-scale storage as single, centralized thing, congratulations: they fucked everything up even before they've started, screwing system architecture, lol.

Who the heck said anything about running it as a single, centralised entity? Do you honestly think you don't need a reliable storage backend, if you've got tapes? Do you know how expensive disaster recovery can be?

Sure, go learn Google to handle petabytes of data. These worthless nuts ran some EXT4 without journal, for speed reasons. Can you imagine it?

ext4 is just one part of their storage stack. Did you miss the part where they were layering it on top of GFS and backing it via bigtable?

It is also possible to do integrity checking as inherent part of many other things, like, say "content addressing" (also dubbing as redundancy and load balancing).

'cept integrity checks are not only expensive but completely useless when you can't even trust the file system itself.

Realistically speaking, CDDL _is_ a niche license. ZFS is most famous product using it. I would have trouble naming another product using it. So it is niche license.

And you would be incredibly wrong. It's fairly well used.
Often with GPL dual licensing. There's Oracle GlassFish, cdrtools, NetBeans, OpenSolaris and many others.

Then, licensing per file looks halfassy. Licensing work as whole looks logical. Licensing per file ... ok, it looks messy.

Just another case of "I'm not a software developer, but..." I see.
Licensing work as a whole is entirely illogical and leads to lots of unnecessary issues.

Stallman would've known that as well, was he an actual, productive engineer.

Linking exceptions?

I didn't say linking exceptions. I said linking policy issues/violations.
It's easy to figure them out when licenses apply only to the perpetrating file.

If someone is GPL, the only "exception" treacherous wrenches trying to work around license should get is a middle finger.

If only.
I think I could live with being given the middle finger. It hurts the perpetrator's legitimacy more than it does mine.

If someone is refuses to cooperate it would be fair they also have to write all the code, lol.

No one refuses to cooperate. GPL demagogues like yourself just want to have their cake and eat it too.
On the one hand, you tell people they have the freedom to use and share code the way they see fit. On the other, you essentially demand they do it according to your personal TOS.

Patent grants are maybe not so bad idea, but I guess sun only did it to protect itself, and it may or may not be relevant to others.

I bet the people engaged in XimpleWare v. Versata and Ameriprise would disagree.

At the end of day, if one has got patents and wants to share them they could issue separate patents grant license.

Sure they could. But what if they (upstream) don't? How do people know they couldn't be sued over using GPLd software? Heck. How do we know the Linux kernel doesn't contain patented software someone might get sued over?

If they want to sue someone.... from realistic standpoint, neither GPL nor CDDL would prevent it.

Of course they don't. Unless you're an American arms manufacturer, you can always be sued. The point isn't being sued though. The point is the possible outcome.
With the GPL it looks much grimmer than with the CDDL.

So it maybe improvement, but not worth of breaking compatibility with loads of pre-existing softare. So I think Sun intentionally attempted to thwart Linux yet trying to pretend they had sensible reasons. While these reasons are looking quite weak.

They themselves didn't even know they would break compatibility with the GPL. Any sensible, neutral source seems to disagree with the FSF's contentions.

**ormaaj** · 24 April 2016, 03:31 AM

I don't see how re-licensing ZFS under the GPL could possibly improve the situation for anyone. The result would be Linux starting to accept GPL-only changes that will be unacceptable to everyone outside of Linux - thereby breaking one of ZFS's most significant advantages over btrfs - its reasonable degree of portability amongst non-Linux free OSes.

The way I see it, CDDL is the perfect license for ZFS. The free OSes all get a nice reliable portable file-system that everyone (even Oracle, to a limited extent) can share. As a side-benefit, Linux is prevented from a) breaking things b) benefiting from ZFS without contributing anything back.

Linux has almost nothing to gain from ZFS in the long run because Linux already has its non-portable GPL-infected answer to ZFS that's doomed to be forever Linux-only unless someone wants to waste their time on a clean-room maybe-almost-compatible reimplementation of Btrfs. Likewise, nobody other than Linux has anything to gain from ZFS on Linux unless somebody can think of a way for people to contribute to the Linux implementation and accept changes back under an acceptable license.

**bridgman** · 24 April 2016, 07:33 AM

There are a lot of licenses which are compatible with both GPL and the non-Linux free OSes, eg the X11 license used for the graphics subsystem. Choosing any of those would allow changes to flow freely in all directions.

**gondur** · 24 April 2016, 09:31 AM

Originally posted by SystemCrasher View Post

Negative. I can't imagine other sane reason to select some really exotic license. Which is also copyleft-style, yet incompatible with other copyleft style.

That's a general property of copyleft: they prevent by their nature sub-licensing and are therefore inherently incompatible with everything which adds restrictive terms beyond their own. The GPLv3 and Apache 2.0 were made compatible by having the same restrictive terms. Permissive licenses are compatible with GPL as they not introduce new restrictive terms with an copyleft license, not as the GPL is compatible. (The GPL does nothing for license comaptiblity)

About the CDDL, irocibcally it is not an exotic license but an member of the MPL family, which was started to address some of the GPL problems. In fact the CDDL is MORE compatible with other, also copyleft, licenses than the GPL (weak copyleft due to per file license possibilty and ). That was the very reason why the FSF fought the CDDL from the very beginning with "GPL incompatibly" FUD until to day. To prevent the adoption of the CDDL and kill this competitor in the field of copyleft licenses early enough. And they were quite successful, the CDDL was after that FUD campaign not selected anymore by many projects and you called it even "excotic".... *sigh*

**ormaaj** · 25 April 2016, 12:44 AM

Originally posted by bridgman View Post

There are a lot of licenses which are compatible with both GPL and the non-Linux free OSes, eg the X11 license used for the graphics subsystem. Choosing any of those would allow changes to flow freely in all directions.

It isn't merely an issue of license-compatibility. The GPL doesn't allow anyone that has incorporated their work from a non-GPL project into a GPL project to accept further derivative changes to that work back into their non-GPL project (unless that contribution has been explicitly dual-licensed) without re-licensing the entire project under the GPL, or at least those parts of their project that can be said to be integrated in a way that the GPL considers to make it derivative, which is always debated endlessly of course.

That's unacceptable to pretty much everybody that isn't already using the GPL and has no desire to re-license their work. This makes it extremely difficult to use a less-restrictive license, contribute that work to a GPL-licensed work, and keep both versions synchronized because it's not easy to require contributors to the GPL-licensed version to license their change in the same less-restrictive way.

Announcement

FSF Issues Fresh Statement Over ZFS On Linux With GPL Enforcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment