Announcement

Collapse
No announcement yet.

Some Users Have Been Hitting EXT4 File-System Corruption On Linux 4.19

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speedyb0y
    replied
    Originally posted by birdie View Post
    People who run the affected systems should either apply the patch immediately or downgrade to kernel 4.18. Running e2fsck/your-fs.fsck is pretty much mandatory.
    That's probably a good Idea even for those without problems, but I do think my ext4.fsck didn't report any errors when I runned it.

    So maybe the kernel is reporting an error while the filesystem is actually ok?

    That would mean anyone having this problem may still have their FS intact...

    So, don't give up up and format the disk like I did, just reboot it with an older kernel before it's too late...

    Leave a comment:


  • speedyb0y
    replied
    Hi guys, and sorry for my english.

    I see that a lot of people can't reproduce the error, and MOST people says they don't see any problem at all.

    But I do can reproduce the error.

    I even thought I was having memory/disk errors corrupting things, but no, I found all those bug reports so the problem was not only with me /o/.

    I can format a disk and reproduce the error once again in less than 5 minutes.

    The problem is that I can't send it in a bug report, because It depends on a HUGE amount of data I have.

    I've been collecting financial data, which I'm paying to much to receive, from the Barchart.com service...
    I collected 700GB of data during all this year, downloaded it, and while using a Python script to "convert" it to another format for my own usage, I started to have this problem - imagine my face when I saw 1 year of hard work and big money going down the hole because of a filesystem corruption!

    How I can reproduce it on my machine:

    The Python script loads a huge file into memory:
    myFile = cbor.loads( zstd.decompress( open('FILE', 'rb').read() ) ) <------ this loads a 8GB file into memory,

    ... process the data...

    ... the process consumes about 25GB of RAM, leaving 7GB free

    ... I start to creating many directories
    os.mkdir('output/' + someName)

    ... on each directory created, I save thousands of files. remember I said I left about 7GB free? each file generated consumes GBs so when Python does a write(), the memory usage is on 100% (~90% used memory + ~10% fs cache)

    ... now I open another file, and do the same work again.

    Now, I believe that maybe, the problem is not on EXT4FS code itself, but in the VM code or something else related, because the problem only appears in this cenario -
    high CPU/RAM usage AND too much filesystem operations (mkdir(), open(O_CREAT), close()) AND too many mallocs() and free() (the Python script does HOURS of work on dictionaries and lists, as the data is saved in a format like JSON). This way, I think that at this point the memory is very fragmented.

    So, this is really a huge and severe bug, almost no one else hits the bug and those who hit it has no idea about what just happened, but I have 19 VMs on cheap cloud servers running the same distribution and kernel I've compiled at home, and I have no problems with any of those machines. And NOTE: I use linux-next (latest GIT, unreleased-yet-kernel, not even reviewd by Linux Torvalds!!!).

    I want to help they to fix this issue, I'm going to recompile the 4.18 kernel, and also try other filesystems and see if I hit this error.

    Can anyone here give me a light of what else I could do to include in the bug report? It's not a fault like a segmentation fault, kernel panic or core dump so all I have to include is my (very minimalistic and customized!) .config


    And those who didn't hit this problem yet: thank God, and start using another HD right now and leave the files you already have untouched, because eventually you will do a "ls -l /my/precious" and you will find out that the directory can't be even listed =]


    PS.: I do miss when Linus Torvalds was so rude and we didn't have those kinds of bugs... actually I only tried to report two things in the Kernel before, and... and that's why I'm writing this here instead of the LKML because I developed a phobia of their responses to newbies like me lol

    Leave a comment:


  • 0Yg7pQpFGiwcw
    replied
    Originally posted by [email protected] View Post
    I was laughing at all the Windows 10 updates shenanigans since October, but looks like we have our own problems too.

    I'm glad I stayed with kernel 4.15 (Kubuntu 18.04), after the first benchmark Michael did showing only marginal improvements in most games since the beginning of the year.
    Well, there is a reason why most distros won't just grab the lastest mainline kernel immediately. Linux development is a kind of unusual case, since the kernel is developed separately from the rest of the OS. "Released" seems to more like mean that the kernel is released to distro developers for integration testing...

    Leave a comment:


  • Michael
    replied
    Originally posted by birdie View Post
    Michael

    The bug has seemingly been identified and a fix has been submitted to mainline. 4.19/4.20 are currently still affected.

    Check the bug report discussion for more details.

    People who run the affected systems should either apply the patch immediately or downgrade to kernel 4.18. Running e2fsck/your-fs.fsck is pretty much mandatory.
    It has not been submitted to mainline yet. By upstream, it's just in Jens' "block" upstream branch. It hasn't yet been sent in or pulled to Linus Torvalds mainline branch. I've been monitoring it closely and will have out a Phoronix article when it's actually in Linux Git and/or back-ported.

    Leave a comment:


  • birdie
    replied
    Michael

    The bug has seemingly been identified and a fix has been submitted to mainline. 4.19/4.20 are currently still affected.

    Check the bug report discussion for more details.

    People who run the affected systems should either apply the patch immediately or downgrade to kernel 4.18. Running e2fsck/your-fs.fsck is pretty much mandatory.
    Last edited by birdie; 05 December 2018, 06:37 AM.

    Leave a comment:


  • lichtenstein
    replied
    I wasn't aware that there was a flame war between btrfs and ext4 users. They are both great and have been perfectly stable for me. I use ext4 on my local machine, jfs on my mini-server (scratch drive), and btfs there too for the raid1'd hdds (checksums, bitrot prevention and all that).

    Leave a comment:


  • xorbe
    replied
    Pinned as blk-mq bug. Rip btrfs users clamoring for an ext4 bug. =P

    Leave a comment:


  • lichtenstein
    replied
    I had the issue with 4.19.5 on my old machine (intel broadwell nuc, samsung 850 pro ssd) and, as with the others here, fsck on reboot and reverting to 4.18.x fixed it. In my new machine (ryzen 2700x, evo 970 nvme) I don't have it even though I do use ext4 locally for everything (2 ssds & 1 hdd), (and CONFIG_EXT4_ENCRYPTION is on for ubuntu kernels). So maybe it's drivers+hardware related. Or could it be an intel (chipset) vs. amd thing?

    What do you guys use, those of you who experienced it?

    Leave a comment:


  • Charlie68
    replied
    I use only Btrfs and Xfs on my PCs and I have never had any problems, in fact I have never had problems even with Ext4. Ok, there are some users who have a problem, it can happen with any file system, but all these file systems are still reliable. Reading the comments I get the impression that even the file systems have their irreducible fans boy, which is sad. When I read I switched from Btrfs to Ext4 because this or because that's ridiculous enough, learn to better configure what you need to use and you'll see that it will work fine.

    Leave a comment:


  • Yae8xahch
    replied
    Originally posted by Flaburgan View Post
    Damn, I read this thread a days ago, but I had just been hit by the corruption problem. Running 4.19.5. (And I was installing 4.19.6 during this time, I don't know if that version fix the problem). I revert to 4.18 now. Using linux mint 19 on a SSD.
    Would you mind to participate in this thread/bug report:

    https://bugzilla.kernel.org/show_bug.cgi?id=201685

    Leave a comment:

Working...
X