Announcement

Collapse
No announcement yet.

Some Users Have Been Hitting EXT4 File-System Corruption On Linux 4.19

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • x4mer
    replied
    Haven't noticed any corruption problems using 4.19 under Mint. However I recently pulled my Radeon 5450 card and put in a GCN 1.0 HD7770 instead. I added the kernel switches to turn off radeon driver and turn on amdgpu driver for SI and CI cards. After that I can boot up installed 4.15, 4.17,4.18 kernels and they're all fine and inxi -xxxG shows the amdgpu driver active for the card with working Vulkan. Booting up 4.19 causes Cinnamon to complain that it's running in software mode, and inxi -xxxG shows that neither amdgpu nor radeon drivers are loaded and it's using vesa driver.

    Leave a comment:


  • profoundWHALE
    replied
    Originally posted by AndyChow View Post
    Sometimes, you just can't fix a bad design. BTRFS is COW, but not COW done right. It is very fast, I'll give it that, and when it works, it works great. But spectacular castastrophic failures still happen all the time. Just ask the BTRFS devs on freenode.
    I almost lost hundreds of gigabytes of video (wedding camera footage) to btrfs and wasted 2 weeks attempting to repair or recover through the tools. I eventually was able to start the process of manually coping every not-broken-thing to some other hard drives. The problem is that when my 12TB btrfs RAID10 pool did that, I needed another 12TB, so I had to delete my several terabyte steam library, and I've already been redownloading it since a week ago. My internet max speed is 1MB/s down...

    Luckily, when originally transferring those video files, I remembered how technology tries to screw me over, so I copied the files in two other places.

    Leave a comment:


  • dwagner
    replied
    Originally posted by sandy8925 View Post
    Ah, I was wondering where the raving anti-CoC lunatics had gone.
    And calling people of different opinion "lunatics" reflects the spirit of that CoC, I guess?

    Leave a comment:


  • AndyChow
    replied
    Originally posted by jpg44 View Post

    A purely COW filesystem does not use a journal so may be more resiliant to the problem since none of the existing disk structures are modified at all, it can read the existing disk structures and write modified versions to new locations. This allows the old unmodified structures to be used for recovery. Not sure if btrfs actually can do this.
    IME, no, BTRFS can't do this all the time, because the way it handles tree extents. You can easily lose everything since your last snapshot, if a tree block can't figure out where it's node or leaf is supposed to be. You will not find this problem with ZFS or HAMMER, or BcacheFS. BTRFS is rather badly constructed. Certainly not by lack of ressources or the intelligence of the devs. Sometimes, you just can't fix a bad design. BTRFS is COW, but not COW done right. It is very fast, I'll give it that, and when it works, it works great. But spectacular castastrophic failures still happen all the time. Just ask the BTRFS devs on freenode.

    Leave a comment:


  • jacob
    replied
    Originally posted by ALRBP View Post
    And I was thinking that maybe switching back to EXT4 (+HW RAID/MDADM) was safer than keeping Btrfs (no RAID5/6)…
    Btrfs without RAID5/6 is perfectly safe. But then, so should be ext4.

    Leave a comment:


  • fuzz
    replied
    Originally posted by ALRBP View Post
    And I was thinking that maybe switching back to EXT4 (+HW RAID/MDADM) was safer than keeping Btrfs (no RAID5/6)…
    Actually this ext4 issue ruined my ubuntu install and made me go back go btrfs. I also wanted an excuse to go back to Gentoo, so that helped.

    Leave a comment:


  • lectrode
    replied
    Originally posted by Weasel View Post
    Rolling Release must be so awesome to force this kind of breakage on you right?
    Are there rolling release distros that automatically update the major/minor versions of kernels?
    I have a couple systems still using 4.14 LTS. On Manjaro, the kernel major/minor version isn't updated unless you specifically install the new version. Only point releases are automatically updated (i.e. 4.14.1 to 4.14.2).

    Leave a comment:


  • Guest
    Guest replied
    Originally posted by bitman View Post

    Hah i came here just after realizing that. Linus becomes nice and kernel goes to shitz.
    Ah, I was wondering where the raving anti-CoC lunatics had gone.

    Leave a comment:


  • Guest
    Guest replied
    Originally posted by bitman View Post
    I tend to believe corruption really does come from outside of ext4 driver. 4.19 is a total wreck of a release. People report all kinds of problems. I myself was getting random freezes every few hours. I do not recall such a disastrous release.
    I had problems with 4.19 too, but it was just the AMDGPU driver. I just switched over to using radeon driver, and just use the Intel GPU on Linux.

    Leave a comment:


  • jpg44
    replied
    Originally posted by AndyChow View Post
    I've experienced corruption on a few hundred files, when switching from multi-queue block deadline to mq-blk none. Might not be related to mq-blk, but it happened in the past 3 weeks. I have full backups, so it's not that bad. What's terrible is that most errors I only found because I keep checksum log audits. If someone doesn't, and got some corruption, they might never know it.

    I've since re-migrated to btrfs, but hey, with my luck, that will get corrupted also. I actually switched from btrfs to ext4 because I was tired of unfixable problems btrfs would throw up every few months when running a scrub. Can't wait for bcachefs.
    the ext4 problem could be a problem with broken hardware and the SCSI subsystems, could affect other FSs, but it could be that ext4s usage pattern only activates the problem, but not other FSs, this has been a problem before with some hardware badly supporting needed SCSI demands relating to flushing of buffers to the disk and ensuring data actually gets written to disk. Its vital that the journal gets to disk before any of the data structures of the filesystem are changed in case there is a power outage. There are out of order cache flushes happening which make it harder to predict when data is written to disk so the other layers depend on knowing when blocks actually reach the disk by getting a notification that a block reached the disk, some hardware badly supported this, and it is necessary for implementing write barriers needed for a journalling FS. So maybe there is some sort of strange flushing issue going on. Could also be something going on with the PCI bus or memory, memory corruption problems, DMA problems, a bug that trashes memory of other kernel subsystems, etc.

    A purely COW filesystem does not use a journal so may be more resiliant to the problem since none of the existing disk structures are modified at all, it can read the existing disk structures and write modified versions to new locations. This allows the old unmodified structures to be used for recovery. Not sure if btrfs actually can do this.
    Last edited by jpg44; 28 November 2018, 03:45 PM.

    Leave a comment:

Working...
X