Announcement

**skeevy420** · 27 November 2023, 03:01 PM

Originally posted by timofonic View Post

Some people say that OpenZFS is a reliable and resilient file system that protects data from corruption.

However, recent reports show that OpenZFS 2.2 has a serious bug that causes data loss when copying files.

This is unacceptable for a file system that claims to have advanced features like checksums, snapshots, and clones.

How can users trust OpenZFS when it fails to perform basic operations without damaging data? OpenZFS should fix this bug as soon as possible and restore its reputation as a robust and secure file system.

OpenZFS is not part of Linux and does not work well with it. Avoid OpenZFS and use something better.

This problem is due to how OpenZFS and GNU Coreutils are interacting with each other. They just both happen to be updating parts of themselves that interact with each other at the same time. The newer a person's user space the more likely they are to encounter this bug.

I could just as easily say that this is an unacceptable bug because it's caused by something as essential as GNU Coretuils and could wonder why we should trust GNU software when it can't interact with other software to perform its basic operations without damaging data, but I'm not going to. Shit happens when multiple pieces of software update and this isn't anything more than that. To try and make one side or the other look bad due to that is just retarded.

**cynic** · 27 November 2023, 03:03 PM

Originally posted by rhavenn View Post

Lol. No, I think the lesson here is don't jump on the latest and greatest release branch until it's been well vetted for anything enterprise and/or make sure you have backups. There is a reason RHEL isn't running the latest release kernel. Bugs happen. Heck, it took BTRFS years to figure out their RAID5/6 issues and the recommendation was to only use BTRFS in a RAID mirror.

the issue with RAID5/6 is not related to some obscure bug that needed to be figured out.
it was a well understood design problem that needed to be addressed, not a code bug.

(I'm using the past because apparently they got a solution)

**kylew77** · 27 November 2023, 03:11 PM

Originally posted by gggeek View Post

Short answer: yes.

Long answer:
in the commercial world, time to market is everything. This means reducing anything which creates overhead. Planning and documenting create overhead, as they impede velocity. They don't fly well with managers.
In the open source world, it looks like developer burn-out is increasing year over year (at least in my sphere), leading to an "only scratching your own itch is enough" mentality. And it is never anyone's itch to document and architect things properly. Or to spend sleepless night trying to reproduce heisenbugs. Works-on-my-computer is king.
In both environments, "agile" methodologies have been widely misused and misunderstood, leading developers to shun documentation, architecture and planning in favour of TDD and build-incrementally, refactor-frequently approaches - of course no-one refactors often in reality. If you have a good test coverage, what use is there for (always-out-of-sync) documentation?
Last but not least, no electronics product is designed to last more than a couple of years, so why would software ones? Funnily enough, we are all running decades-old software now...

Now, get off my lawn!

Are file systems like FFS/UFS and Ext2/3/4 subject to this or are they small enough to have proofs that they won't eat your data?

**cynic** · 27 November 2023, 03:16 PM

Originally posted by kylew77 View Post

Are file systems like FFS/UFS and Ext2/3/4 subject to this or are they small enough to have proofs that they won't eat your data?

even if they were perfect (which they are not, if you watch the recent history of ext4) not doing full data checksumming expose you to the risk of losing data due to unnoticed bitrot.

**flower** · 27 November 2023, 03:22 PM

Originally posted by cynic View Post

even if they were perfect (which they are not, if you watch the recent history of ext4) not doing full data checksumming expose you to the risk of losing data due to unnoticed bitrot.

you can do full data checksumming with dm_integrity. i have dont it in the past and it works well. it even allows for self-healing when combined with a mirror or raid5/6

but i highly suggest to store the checksums on a different drive. in that case their is no performance impact and you dont have any problems with data aligment in a raid configuration

**rhavenn** · 27 November 2023, 03:54 PM

Originally posted by cynic View Post

the issue with RAID5/6 is not related to some obscure bug that needed to be figured out.
it was a well understood design problem that needed to be addressed, not a code bug.

(I'm using the past because apparently they got a solution)

That's even worse though, because it was recommended as production, then data started disappearing and then they were like "whoa..whoa...JK...don't use that".

**phoronix_anon** · 27 November 2023, 04:35 PM

Eagerly awaiting a Jim Salter article about how this is actually a btrfs issue.

**muncrief** · 27 November 2023, 04:35 PM

Originally posted by gggeek View Post

Short answer: yes.

Long answer:
in the commercial world, time to market is everything. This means reducing anything which creates overhead. Planning and documenting create overhead, as they impede velocity. They don't fly well with managers.
...

Then things have indeed changed since my day gggeek.

Back then management demanded thorough planning, documentation, projections on team members, organization, responsibilities, and a progressive ladder of goals.

And I'm not just talking about large companies like HP and Sun, but also the myriads of other small companies and startups I worked for. Whether it was hardware or software, or a combination of both, thorough planning was known to be the only path to sure technical success. Yes, there was always pressure from marketing to "hurry up", but they could be reasoned with because everyone agreed that quickly releasing a faulty product could mean doom, as customers were slow to forget disasters. While releasing a late one that worked well would succeed, as the slipped schedule would soon be forgotten.

So my advice to both the ZFS and BTRFS projects is simply this.

Immediately institute a feature freeze and develop the detailed architectural documents and support tools they should have created in the first place. Of course critical bugs must be addressed immediately as best as possible, and I would appoint a team specifically for that task.

But it's long past time for both projects to take a step back, organize into development teams with specific goals, review all code that has been developed thus far and assure it is extensively commented and adheres to agreed upon coding organization, and ultimately refine the code to adhere to the new architectural specifications.

Until then they will continue to flap in the wind, playing whack a mole with one of the most integral parts of a computing system.

If I weren't so freakin' old and worn out I would love to jump in at the organizational level and assist in their new direction, but my era is past and it's time for the younger generation to reacquaint themselves with the hard lessons we learned, but appear to have been forgotten.

**AlanTuring69** · 27 November 2023, 06:00 PM

Originally posted by timofonic View Post

I really hope Bcachefs follows a better approach, sincerely...

Do you seriously think that Bcachefs will have fewer issues with it being a purely for-fun project and having Kent as the effective leader? ZFS has been around for a long, long time and has swallowed very little data. Bcachefs cannot possibly compete with ZFS unless it turns out it's the literal holy grail and someone spends millions on it. I've not seen any data to suggest that it is anything other an a very interesting filesystem of which there are many.

Originally posted by muncrief View Post

Then things have indeed changed since my day gggeek.

Back then management demanded thorough planning, documentation, projections on team members, organization, responsibilities, and a progressive ladder of goals.

And I'm not just talking about large companies like HP and Sun, but also the myriads of other small companies and startups I worked for. Whether it was hardware or software, or a combination of both, thorough planning was known to be the only path to sure technical success. Yes, there was always pressure from marketing to "hurry up", but they could be reasoned with because everyone agreed that quickly releasing a faulty product could mean doom, as customers were slow to forget disasters. While releasing a late one that worked well would succeed, as the slipped schedule would soon be forgotten.

So my advice to both the ZFS and BTRFS projects is simply this.

Immediately institute a feature freeze and develop the detailed architectural documents and support tools they should have created in the first place. Of course critical bugs must be addressed immediately as best as possible, and I would appoint a team specifically for that task.

But it's long past time for both projects to take a step back, organize into development teams with specific goals, review all code that has been developed thus far and assure it is extensively commented and adheres to agreed upon coding organization, and ultimately refine the code to adhere to the new architectural specifications.

Until then they will continue to flap in the wind, playing whack a mole with one of the most integral parts of a computing system.

If I weren't so freakin' old and worn out I would love to jump in at the organizational level and assist in their new direction, but my era is past and it's time for the younger generation to reacquaint themselves with the hard lessons we learned, but appear to have been forgotten.

What you're describing is a professional, commercialized development process typically seen with serious projects and real, paying customers. OpenZFS is not a particularly commercialized project with mostly hobbyist / borderline-hobbyist involvement. What you're describing is at least several full-time jobs (more like a dozen just on the engineering front, if you want features and effective testing) which also does not translate well to projects such as these. If you want to establish a company which runs its own spin of ZFS with such architect-led decision making then I'm sure people would use it, but unless you have millions to spend for zero gain then it's not happening. It's outrageous to expect anything other than what's happening. Put even more simply, I think they know. I am grateful to have a fantastic filesystem work with the latest Linux kernel, for free.

With that said, this is the first time that I can remember something like this happening which is fewer than any other project of a similar lineage / userbase that I can recall.

**timofonic** · 27 November 2023, 06:08 PM

Originally posted by muncrief View Post

Then things have indeed changed since my day gggeek.

Back then management demanded thorough planning, documentation, projections on team members, organization, responsibilities, and a progressive ladder of goals.

And I'm not just talking about large companies like HP and Sun, but also the myriads of other small companies and startups I worked for. Whether it was hardware or software, or a combination of both, thorough planning was known to be the only path to sure technical success. Yes, there was always pressure from marketing to "hurry up", but they could be reasoned with because everyone agreed that quickly releasing a faulty product could mean doom, as customers were slow to forget disasters. While releasing a late one that worked well would succeed, as the slipped schedule would soon be forgotten.

So my advice to both the ZFS and BTRFS projects is simply this.

Immediately institute a feature freeze and develop the detailed architectural documents and support tools they should have created in the first place. Of course critical bugs must be addressed immediately as best as possible, and I would appoint a team specifically for that task.

But it's long past time for both projects to take a step back, organize into development teams with specific goals, review all code that has been developed thus far and assure it is extensively commented and adheres to agreed upon coding organization, and ultimately refine the code to adhere to the new architectural specifications.

Until then they will continue to flap in the wind, playing whack a mole with one of the most integral parts of a computing system.

If I weren't so freakin' old and worn out I would love to jump in at the organizational level and assist in their new direction, but my era is past and it's time for the younger generation to reacquaint themselves with the hard lessons we learned, but appear to have been forgotten.

I'm negativist about it, they are poised to die. When a better designed and managed filesystem exist in Linux, both will be forgotten as Reiser filesystems.

I have some hope in Bcachefs, but it's still too early to know it.

Announcement

OpenZFS Is Still Battling A Data Corruption Issue

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment