Announcement

Collapse
No announcement yet.

Those Using The XFS File-System Will Want To Avoid Linux 6.3 For Now

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    One of the experts (Eric Sandeen) has identified 74c36a8689d ("xfs: use xfs_alloc_vextent_this_ag() where appropriate")​ as the commit which appears to cause corruption in 6.3 (the bug has apparently been fixed in 6.4, although that may be a side effect of other fixes in the same area). That is an important step along the way in the resolution of the bug. While this is an extended holiday in the US, I expect this to get fully understood and a fix available in the next few days.

    Comment


    • #32
      Originally posted by Chugworth View Post
      At this point I don't see why anyone would still be messing with XFS. I've heard enough corruption stories about it to not trust it any more than the other filesystems. If you're scared of features and change then just stick with ext4. Otherwise use Btrfs and/or ZFS. Personally I use both. I find Btrfs to be great for the mirrored OS partition, and ZFS to be great for the RAID6 data partitions.
      If you are worried about data corruption, btrfs is the absolute last filesystem you should be using. Btrfs on single drive filesystems is an absolute joke. Not only is it incredibly slow, but with no fsck/repair, you are a single power loss event away from an unmountable filesystem.

      Comment


      • #33
        Since 2 years all my partitions are XFS and I have not experienced anything strange yet, even using 6.3.4 kernel.

        Comment


        • #34
          Originally posted by partcyborg View Post

          If you are worried about data corruption, btrfs is the absolute last filesystem you should be using. Btrfs on single drive filesystems is an absolute joke. Not only is it incredibly slow, but with no fsck/repair, you are a single power loss event away from an unmountable filesystem.
          thanks avis and others here (including stormcrow etc) for weighing in on these discussions....

          In terms of XFS itself - this bug has convinced me to place less trust into filesystems being developed within the main kernel (including ext4, btrfs and so on). Just generally and broadly speaking. It's not that i don't place some level of trust into them - i do. However the history and track record and just general approach into how those testing is done (before getting released along with the kernel release) does show a level of uncertainty. And more to the point it more so reveals what avis was talking about which is that the path between the mainline kernel, and proper testing. Well maybe only redhat and google really puts through their paces thouroughly enough before hitting people (like me) who are using the latest so-called 'stable' kernels (not lts, just released kernels). Such as the 6.3 which is built and distributed by people like xanmod.

          However it is very much a double edged sword. And that point about old bugs versus new bugs, being a whitewash. It stands as a very strong defense. So in terms of using those specific kernels (xanmod or liquorix etc.)... the thing is if some important bugs are found, then YES they get pretty rapidly integrated into the next kernel minor patch release. Which comes very soon. Such as "6.3.11" or whatever. And those get build and redistributed (by xanmod) automatically, pretty darn fast. And that is pretty nice. And if you think an LTS bug is fixed but not in those... well IDK what to say about that. Because I am much less familiar with LTS. But i would have expected that they also come to the mainline kernel too. If it is a known bug of much significance (that is actually applicable to both branches).

          Moving on to the 2nd point. OK so there will be bugs. And not enough regression testing in those kernels. Yes. However of all the kinds of bugs, the only actual critical bugs i actually have any *sufficient* level of concern over are filesystem bugs. Right? Because even if there is a bug that destroys my hardware - well i happen to be absolutely FINE with those. They are pretty darn rare, and i can replace all my hardware pretty economically. So being an individual user (and not a datacenter) i don't have millions of dollars of rack servers $$$ assets to manage here.

          OK so what filesystems do i use? Well I have already established that i steer well clear of anything sort-of "new filesystems". Or much actively develeoped FS in the main kernel. I will use EXT4. But not btrfs, xfs, or even LVM, MADM etc. Maybe later on I might consider something like bcache in future (but it depends). Anyhow. We can probably thing EXT4 being so mature isn't going to receive a lot of patches. Or be so much risk (I would dearly hope). But lets say it did... then at least literally everybody out there is using it so the chances are much higher that somebody else will be unlucky before it gets to me.

          So what that leaves? Well ZFS. I use zfs.

          The thing about that is that we seem to have had issues in past. There can be issues occur. However it seems there are enough concerned parties who beta test. And look out for bugs. People are participating (who are not RHEL or google) in enterprise settings. To check the edge, and new releases for regressions. And any bugs there is enough vested data on the line and at stake. That we do seem to have a fairly good process now. And of course this is outside of what the kernel is doing.

          When it comes to kernel breakages. Then that takes time. So (for example). Kernel wants to keep breaking ZFS with GPL only symbols / api. Or whatever. We recently had a larger breakage. So 6.3 backporting is not ready yet. So for some weeks is delayed. We don't have patches for 6.3 yet. But lets say it isn't delayed due to that. We still have a delay in our process so that there is a significant time window for testing to take place. Before those patchset is officially released into the next zfs release.

          So my trust is relatively high(er) in zfs, than in the stuff going into the kernel. I can see the people who are doing the testing, opening issues and reporting and possible concerns from those enterprise customer of zfs. Of which there are lets say "probably enough" IDK. It seems popular "enough" in the enterprise space from all of those multiple smaller independant outfits who are full on using zfs. That they are really dedicating their own testing server resources (not production). And so on. And if there is a concern i can ask about. Or i can eliminate it due to being for configurations that are very specific only to those other enterprise setups. That then is unlikely to affect my own (much simpler and more basic) zfs setups. As i don't deal with so many terrabites, so mand disks, or use much newer or advanced features. My usage of zfs feature set is relatively conservative. I don't stress the system so much.

          So anyhow that all amounts to an out-of-kernel independant testing system for that filesystem. Then any other kernel bugs i really don't need to worry so much. Or even give 2 hoots about. Because if it isn't going to cause a data corruption, then the next down on that list is a security vulnerability. And if there are actual breakages (in the runtime environment, something else gets broken that i actually use, like graphics or networking, or anything else bad)... then i can just reboot into an older kernel. Before the regression happened. And that can be pretty much any kernel i like. Including LTS ones, or just 6.1, or 5.12. Whatever. I don't need to worry, those kernels are just a reboot away. So long as my ZFS dkms module is built for them (which is allways fine for older kernels).

          So my current kernel is 6.2.12.. ZFS is not going to move up to 6.3 so soon due to larger breakages. That extra bit of delay actually helps me avoid any bugs that hits other people on xanmod... I am usually several weeks behind them. It's a kind of a reasoned 'balance'. I am balancing my positions, my risk. To be somewhere 'inbetween' these 2 extremes.

          So rather than participate in some non-sensical black and white debate. I can instead enjoy the best of both worlds. By taking a more reasoned approach to those individual risk exposure. I might still be at some level(s) of risk from time to time. But there are mechanisms which notify me and help me become aware of periods when that occurs. So i get a heads up. And i can make my own judgement calls when it is safe to move forwards for point releases (patch are automatic). This does not reduce my risk to 0. But then i can also enjoy many other benefits of running recent linux kernels. Not the latest, but slightly behind. So i get MOST of those benefits usually within a timefame of N weeks. Which is a timeframe of my own choosing. And quite frankly that is plenty good enough. Since i don't need ALL new features in the new kernel releases. Only SOME of them. Depending which hardware(s) i have, or which specific features i actually am using here.

          I don't run LTS kernels. But they are installed (at least the ones from ubuntu here). And I will consider booting into an LTS kernel when i am trying to recover from a major system issue. But that happens very rarely, maybe once every few years.

          So i suppose my only remaining question is: being on ubuntu 23.04... can i also install LTS kernels from google or RHEL onto my system too? To just have more fallback options. Or is that just pointless to ask, since the ubuntu LTS kernels are already of a comprable quality and reliability?
          Last edited by dreamcat4; 27 May 2023, 05:52 AM.

          Comment


          • #35
            Originally posted by dreamcat4 View Post

            So what that leaves? Well ZFS. I use zfs.
            I've been using ZFS on FreeBSD for over 5 years and I'm very happy with it. I've never seen a corrupt file in all these years, and my desktop hasn't crashed 'completely' once.

            A long time ago, HAMMER was even more stable than ZFS in some situations:​


            I think HAMMER2 is still a better file system than EXT4, BTRFS and XFS.​

            If I have a choice between EXT4, BTRFS and XFS I always choose EXT4.
            On some systems like Clear Linux I have already seen a lot of strange EXT4 errors, but on most distros EXT4 works the most reliably of the three.​

            Comment


            • #36
              Originally posted by partcyborg View Post

              If you are worried about data corruption, btrfs is the absolute last filesystem you should be using. Btrfs on single drive filesystems is an absolute joke. Not only is it incredibly slow, but with no fsck/repair, you are a single power loss event away from an unmountable filesystem.
              Have you read the btrfs manual? Specifically:

              Traditional filesystems need to run their respective fsck utility in case the filesystem was not unmounted cleanly and the log needs to be replayed before mount. This is not needed for BTRFS. You should set fs_passno to 0.

              If you wish to check the consistency of a BTRFS filesystem or repair a damaged filesystem, see btrfs-check(8). By default filesystem consistency is checked, the repair mode is enabled via the --repair option (use with care!).

              How come it's not needed? Copy on write.
              ZFS and Btrfs, two full copy-on-write file systems. They avoid in-place changes to assure levels of consistency similar to a journal. They also provide a dummy fsck.[6] btrfs-check is still available to check for suspected problems in filesystem structure (e.g. when a software bug or hardware issue is suspected).​

              Comment


              • #37
                Getting back to try to underline the main point, (which i already mentioned in my longer commentary)....

                The counter argument people make to bugs in XFS and BTRFS is (other than 'its fine for me)... the real argument is that well: all filesystem gets bugs into them. And ZFS included. This is very much a true statement,

                The DIFFERENCE is the way that the filesystems code is getting released into the public, and the presence / absence of testing, or regression testing, or who is responsible for testing.

                Because what is important to realize is that the whole linux kernel development model is a developer focussed model. The systems is designed to be highly streamlined for reviewing patchsets, for maintaining a code quality. And for (once bugs becomes known, and patches made)... then to very quickly and effectively get those patches into the next kernel release. Which then comes out very soon for all to enjoy. Great, fantastic.

                However while code can be reviewed well to a high quality, this leaves out, or outside this structure the actual real world testing. And/or regression testing. There is no warranty supplied with this software - that is fundamentally a very bold and underlined statement.

                So then you have to rely upon (as avis originally said) other 3rd party players like 'whomever' outside of that kerenel team structures. For example people who maintains LTS kernel (as maybe also their own patchsets onto those kernels). Which could be anybody from Redhat to google, to canonical, to the raspberry pi foundation.

                The point is that this is where the issue lies with FS bugs, that may remain undetected, and fly under the radar. Until as and when it becomes aparrent or clearly understood. To then go back to those kernel devs and say: heh we found something. And then ok they are only human, if they missed that. Then they can make some patch and fixes. All great.

                So where my point lies here: the ZFS as an enterprise FS does have a similar looking model - superficially speaking. But the culture and mindset might be a bit more conservative, and the path to releases being a bit more conservative. The software itself is a moving target, so at times there have been greater risks to new code introduction than at other times. But it seems like the recent policy changes in ZFS means that there is now more delays to release, more conservatism. And/or more regression testing. And it is something that is actually within its own community more aparrent and easier landscape to take in.

                Wheras a project like BTRFS, they are but a few subset of developers within how many thousand broader kernel devs. I know they have their own subsystems and trees. But still, you are less clear on who is who, and it seems like as a general member of public i have to do my own legwork to investigate who is who, what company they work for. And if that kernel dev really does have backing in some enterprise to do lots of regression testing or not.... it is just less clear and for the way that everything is done on mailinglists mostly for kernel. So transparency and individual accountablility. Or just consequences seems on the surface a bit less than with ZFS project. Where devs and other prominent members are more well known within that community. And (for example) it's a piece of cake to evaluate those relationships of trust into what is being done and by whom. And how thorough it is likely to be. And on what types of raid, or which specific features. Be it deduplication or something else. And if i don't know, i can ask somebody within this pretty friendly community, and they will give very certain sounding and helpful answers that gives me an accurate and reasonable trust picture. So if i do want to take up using some newer feature. I am like... in a better position not to just jump in because some random person on some random person just said "works for me". For some specific BTRFS feature. But then wait... and they say: oh but there is an issue with this thing or that thing. Or there was some data corruption. But dont worry about it all fixed now. Which might be true statements. But it dilutes the blind trust into being well... just a blind trust. Instead of a properly informed levels of trust structure.

                I can safely say that there are some features of ZFS i have some aprehension about using. But at the same time I do not actually have any real worlds needs for those features. And the ones that i do need - they are all just rock solid as far as i can tell (since YEARS, even decades). Which is where you want to be for a filesystem. Rather than something like oh IDK (this is a made up number). Only "3 years ago" was fixed in BTRFS, or some such features that many people uses, or want to use, but was only actually added pretty recently in the overall lifetime.

                So that is where I am at with something like BTRFS. I might consider it again in a few years time. (or less). But at the same time, i will be keeping my eye on the track record of ZFS going forwards too. Because all of it is still something of a moving target. To a greater or lesser degree.

                Comment


                • #38
                  dreamcat4

                  On Arch with OpenZFS I've booted into LTS more often than I'd like. It's just something that happens when upstream pushes linux-stable on you without caring about out of tree modules*. That's literally why I use CachyOS from ptr1337 these days. He does a damn good job keeping OpenZFS and Linux in sync with each other. The fact that it offers v3 repositories is just icing on the cake.

                  *I've had the same thing happen with NVIDIA drivers and **trigger warning** AMD Catalyst.

                  Comment


                  • #39
                    Originally posted by partcyborg View Post

                    If you are worried about data corruption, btrfs is the absolute last filesystem you should be using. Btrfs on single drive filesystems is an absolute joke. Not only is it incredibly slow, but with no fsck/repair, you are a single power loss event away from an unmountable filesystem.
                    I always find it amusing to see people worrying over the the lack of an fsck. No, there is not an fsck and there is not going to be an fsck. That's by design. Btrfs and ZFS are filesystems that detect errors and repair themselves as they run. If your data is ever screwed up so bad that they can't even recognize it as a valid filesystem then you've got some serious issues and would be better off just restoring from a backup. If you had that bad of an issue on a filesystem that has an fsck and it could make it readable again, I wouldn't trust all of the data to be correct.
                    Last edited by Chugworth; 27 May 2023, 09:25 AM.

                    Comment


                    • #40
                      Originally posted by Vistaus View Post

                      That's what I keep saying, but every time I say IT admins only run LTS, I always get 10 comments of IT admins or otherwise IT-related people that say their company doesn't run LTS…
                      Context: Work at a massive cloud computing company

                      We run Fedora and a couple of times we actually found kernel bugs and submitted patches upstream

                      ¯\_(ツ)_/¯

                      Comment

                      Working...
                      X