Announcement

Collapse
No announcement yet.

XFS With Linux 6.9 Brings Online Repair Improvements

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by muncrief View Post
    That's a lot to quote.
    8 years ago I switched from XFS (root) and NTFS/BTRFS (data) for straight OpenZFS and couldn't be happier. For that past year I've ran CachyOS and haven't had a single issue using their kernels that offer ZFS support. I've tried other file systems on and off for root volumes, but my data volume has become the RAID of Theseus with OpenZFS. My OpenZFS journey started by me formatting a 500GB BTRFS partition until this one day where I copied some NTFS data over to it, expanded, copied more data, repeat until only OpenZFS was left. A year later that became a 2x2TB mirror and it's currently a 3x4TB raidz (2 HDD, 1 SSD with a 1TB cache drive).

    It'll be 9 years sometime this summer.

    Having a mirror or better when OpenZFS, your BIOS, or KDE tells you "Hey, your disk is going bad." and being able to detach it before real corruption happens is a real data saver and takes a lot of worry away. When you only have the one disk start to go bad you go into Pucker Factor 9.99 with the worry of "Will my old disk fail when I'm trying to copy the data to the new disk?"

    That piece of mind is worth pinning the kernel or using a distribution like CachyOS that provides ZFS modules. Also, thanks ptr1337 for giving a damn about OpenZFS and taking away the OpenZFS "sudo pacman -Syu" worry.

    Comment


    • #12
      Originally posted by skeevy420 View Post

      8 years ago I switched from XFS (root) and NTFS/BTRFS (data) for straight OpenZFS and couldn't be happier. For that past year I've ran CachyOS and haven't had a single issue using their kernels that offer ZFS support. I've tried other file systems on and off for root volumes, but my data volume has become the RAID of Theseus with OpenZFS. My OpenZFS journey started by me formatting a 500GB BTRFS partition until this one day where I copied some NTFS data over to it, expanded, copied more data, repeat until only OpenZFS was left. A year later that became a 2x2TB mirror and it's currently a 3x4TB raidz (2 HDD, 1 SSD with a 1TB cache drive).

      It'll be 9 years sometime this summer.

      Having a mirror or better when OpenZFS, your BIOS, or KDE tells you "Hey, your disk is going bad." and being able to detach it before real corruption happens is a real data saver and takes a lot of worry away. When you only have the one disk start to go bad you go into Pucker Factor 9.99 with the worry of "Will my old disk fail when I'm trying to copy the data to the new disk?"

      That piece of mind is worth pinning the kernel or using a distribution like CachyOS that provides ZFS modules. Also, thanks ptr1337 for giving a damn about OpenZFS and taking away the OpenZFS "sudo pacman -Syu" worry.
      Thank you for your reply skeevy420, but the reason I switched to Cachyos was because others had said OpenZFS was a priority for it, so I assumed they would patch the last supported kernel until the latest one was supported. However I was very disappointed when they suddenly switched to 6.8.0 a few days after it was released and completely dropped 6.7. I searched everywhere for the previous 6.7.9 kernel package hoping to patch it myself to 6.7.10 but could only find binaries with no PKGBUILD. I even asked where to find it on their forum but received no reply. So I just did what I had to do with Arch and am now using the latest Manjaro kernel.

      But that's why I'm asking here before doing any more distro and filesystem switching. Converting my Manjaro media server to Cachyos was really a pain, but they have a script for Arch so converting my Arch workstation was easy. And wow, converting 11 TB of ext4 files from my media server to OpenZFS was really time consuming and difficult. So rather than assuming anything I'm getting as many respected opinions as I can this time because I don't want to keep having to do this stuff over and over again.

      Comment


      • #13
        Originally posted by muncrief View Post
        So what are your opinions fellow Phoronix readers? Is XFS an appropriate replacement in my situation for OpenZFS?. Keep in mind I don't have any RAID systems as in my experience they're far too expensive and complex to be worth it, and I've actually lost much more data to RAID failures than silent data corruption. So my multiple backups are sufficient. I'm also not concerned with volume management as I use mergerfs for that. I simply need a filesystem that can be installed on single disks and reliably detect silent data corruption.
        I had tried BTRFS but had issues with the RAID bug, so just decided to switch to XFS. I figure if it's good enough for RHEL customers, it's good enough for me.

        Comment


        • #14
          gbcox

          How should we update the Kernel without removing the old version?
          We provide proper support for EVERY major update of the kernel and zfs module. Yes, upstream dont offically support it yet, due ONE missing PR, but we have pulled this into our zfs module and maintaining an own branch.

          This has been also tested from several people with different configurations. Same for the NixOS People, which use the CachyOS Kernel.
          We really care about zfs support, specially when the major version does change and we go through a long time of testing and reporting also bugs to zfs.

          If you are using zfs-dkms, then you need to do on your own, but the cachyos configuration does use the precompiled zfs module from our side, which is all time compatible.
          If you want to use an older kernel, youll find it in your cache.

          Edit:
          And if you really want to downgrade the kernel to version X, you can all time to this via compiling it on your own.
          Just use an older commit and then set your options and compile your kernel.
          Archlinux Kernel based on different schedulers and some other performance improvements. - CachyOS/linux-cachyos

          Comment


          • #15
            Originally posted by gbcox View Post

            I had tried BTRFS but had issues with the RAID bug, so just decided to switch to XFS. I figure if it's good enough for RHEL customers, it's good enough for me.
            Thanks for your reply gbcox, that's why I'm considering switching. What's your experience with undetectable data corruption though? As I said that's my primary concern with it. OpenZFS has really worked great, and detected corruption in a 23 year old file 4 months ago which was really awesome. It was from a complex project I was developing back in the day and losing it would have really hurt. But I was able to easily restore it from a recent backup so all is well.

            But as I said OpenZFS at times runs so far behind the latest kernel that I've had to take raw patches and apply them to their latest supported kernel and I'm really not qualified to do that, and worry that my patch attempts might make things even worse. But so far I've been able to do it without any apparent issues, though it's just too much worry and strain for me to want to continue it. As I said in a previous post I really thought Cachyos would take care of it for me, but they're even worse than Arch in that they release the .0 version of the kernel and don't even wait for .1. It really was surprising, and I couldn't even find their PKGBUILD of 6.7.9, even after inquiring on their forum, so I just have to use the Manjaro kernel for now. But even Manjaro will drop 6.7 in a week or so and I'll be back to hunting for raw patches and attempting to modify and apply them myself.

            Comment


            • #16
              Originally posted by muncrief View Post
              I don't want to start any flame wars, so please everyone, if you care to respond to this post please do so civilly with technical observations and opinions. I'm asking this question here because of the wealth of technical knowledge and experience of the many readers and contributors at Phoronix.

              My problem, and question, is a common one with a plethora of answers when searching the internet. I have a huge media server, with approximately 11 TB of data accumulated over 4 decades, and was plagued for years by silent data corruption. Though I have both local and cloud backups of everything, if an issue wasn't discovered for 2 or 3 or 4 years, or more, at times it was impossible to find and restore the uncorrupted data.

              So about eight months ago I finally decided to convert all my local data filesystems to OpenZFS. My goal was simply to discover silent data corruption as quickly as possible. Repairing it isn't an issue for me because of the multiple backups, just detecting it.

              And OpenZFS has worked magnificently, with one exception. Linus will simply never allow it to be incorporated into Linux so it always falls behind the latest kernel version. The claim is that its CDDL license is unacceptable, though other exceptions have been made. But there's no point in arguing about it as Linus will never change his mind.

              At first I didn't think this would be an issue because most people said it would only be a few weeks behind the latest kernel, but I've found that not to be true. So trying to run a rolling distro like Arch (actually Cachyos as of a month or so ago) has become problematic. I know the simple answer is to run an LTS kernel, but as they become more and more ancient I end up missing out on a lot of kernel improvements and new features, with KVM being my primary concern.

              So I've been looking into XFS more and more, but the myriads of opinions and experiences have simply left this old R&D engineer confused. And by the way I'm not interested in newer filesystems like btrfs, etc. because the problems with even mature filesystems is already enough to handle.

              My worries about XFS are the claims that it is easily self corrupted by power outages, failed writes, etc., and at times this corruption cannot be detected. I'm not worried about power failures because I have multiple UPS devices for my computers, TVs, AVRs, etc. It's the other self corruption issues, and their purported undetectability, that concern me.

              So what are your opinions fellow Phoronix readers? Is XFS an appropriate replacement in my situation for OpenZFS?. Keep in mind I don't have any RAID systems as in my experience they're far too expensive and complex to be worth it, and I've actually lost much more data to RAID failures than silent data corruption. So my multiple backups are sufficient. I'm also not concerned with volume management as I use mergerfs for that. I simply need a filesystem that can be installed on single disks and reliably detect silent data corruption.
              I don't think that btrfs can be called "new": it is 15 year old. I didn't tested on a 11TB filesystem, but it should work. If you avoid raid5/6 and quota (performance concern), it should be stable. Recently (linux kernel 6.1) it was added the "block-group-tree" feature, for large disk (>few TB).
              I don't think that XFS would solve your problem; if you already experienced corruption problem you need a filesystem with checksum: and the only options are zfs and btrfs. bcachefs is too young.

              It would be interested knowing how often these data are read. The typical UER is around 10**14-10**15 bit read, so it is not far to 12TB. However if you read 1kb/month ( :-) ) this is not the case, and I suggest you to investigate if it is an HW reliability problem (e.g. power supply ?).

              For what it worth, I want to share a my past experience: I used a btrfs with a non reliable power supply for few months. Sometime the filesystem went RO and/or the hard disk went offline, but I never experienced a filesystem corruption: the data were good or not written (committed). For sure it helped me the quality of the hard disk. But also BTRFS was good.

              Comment


              • #17
                Originally posted by muncrief View Post
                I don't want to start any flame wars, so please everyone, if you care to respond to this post please do so civilly with technical observations and opinions. I'm asking this question here because of the wealth of technical knowledge and experience of the many readers and contributors at Phoronix.

                My problem, and question, is a common one with a plethora of answers when searching the internet. I have a huge media server, with approximately 11 TB of data accumulated over 4 decades, and was plagued for years by silent data corruption. Though I have both local and cloud backups of everything, if an issue wasn't discovered for 2 or 3 or 4 years, or more, at times it was impossible to find and restore the uncorrupted data.
                You don't even mention the filesystem you're using now, so it's very hard to understand _what_ the issue you're having is. More likely is hardware (lack of ECC? something else?) but it can also be software, if you're running something exotic.

                I'm running XFS for more than 20 years, on tens of TBs of storage, and I haven't found yet any corruption. Probably there are bits flipped in some files, though, because of the following:

                But note that XFS is *not* designed to catch user data corruption. The recent work is for improving *metadata* health.

                Comment


                • #18
                  Originally posted by muncrief View Post
                  What's your experience with undetectable data corruption though? .
                  I did have an weird issue with USB drives that I reported, but it appeared to be a false positive, software or potentially hardware issue, but it was never resolved. Haven't seen any issues with my SATA drives, and my backups to the USB are good, so thinking it was a false positive. I don't run XFS_REPAIR now unless I get a alert that I need to run it - and I haven't received any alerts.
                  Running into a strange issue, I had an usb disk that didn’t have a clean mount so ran xfs_repair against it. Numerous errors were encountered, files were moved to lost+found. I then repeated the xfs_repair and each time it finds more problems and moves more files to lost+file. I then ran smartctl short and long tests and it found no errors at all which seems weird. Does anyone know what might be happening. Seems weird that XFS is having issues and SMARTCTL finds no problems. Edit: Also ins...

                  Comment


                  • #19
                    Originally posted by sarfarazahmad View Post
                    Still can't shrink it. Will we ever get that?
                    No, shrinkage is out of the option, we require eternal growth.

                    Comment


                    • #20
                      Originally posted by gbcox View Post

                      I had tried BTRFS but had issues with the RAID bug, so just decided to switch to XFS. I figure if it's good enough for RHEL customers, it's good enough for me.
                      One uses btrfs if one needs compression or subvolumes. Xfs provides neither. So if you don't need any of those features, there is little reason for btrfs, there are faster file systems.

                      Comment

                      Working...
                      X