Announcement

Collapse
No announcement yet.

That Linux 5.12 Severe Data Corruption Bug Hits Intel CI Systems - Issue Caused By Swap File

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • That Linux 5.12 Severe Data Corruption Bug Hits Intel CI Systems - Issue Caused By Swap File

    Phoronix: That Linux 5.12 Severe Data Corruption Bug Hits Intel CI Systems - Issue Caused By Swap File

    Last week I issued a warning of possible data loss on the early Linux 5.12 kernel code that was reliably leaving my test systems severely corrupted. Intel's internal graphics test systems it turns out have now been bitten by this issue in encountering this significant file-system corruption and as such they've been quick to jump on the issue - there's now an idea what's causing the nasty issue and a workaround by reverting select patches...

    http://www.phoronix.com/scan.php?pag...apfile-Corrupt

  • #2
    Swap files (and volumes) mess with ZFS too and, oddly enough, random swap file shenanigans like these are why I still do the old school swap partition=memory+a hair versus giving the whole disk to a file system and using swap files. I like to go a hair larger than my mem size to be safe than sorry.

    Glad to know that being old school and prudent would have saved my ass

    Comment


    • #3
      That image of the dead SSD was heartbreaking 😢
      Last edited by lyamc; 02 March 2021, 02:59 PM.

      Comment


      • #4
        That sounds very similar to a problem I had a couple of years ago. I was doing a big build (can't remember what now but it may have been Chromium or something) across a lot of threads and suddenly realised that I was going to run out of memory. I quickly created a big file on the SSD containing my root FS, rather than using the three "bulk storage" hard drives, mkswap'd it and swapon'd it while the build was running. A short while later, the build crashed out horribly and my system started falling apart while I was scratching my head trying to work out what the weird error messages in the build log were all about. dmesg revealed a whole bunch of ext4fs errors on the SSD; a reboot revealed it to be unbootable and heavily corrupted.

        I just put it down to the relatively cheap SSD (a Sandisk Ultra - reputable but certainly not high-end) not being able to handle the absolute hammering it got from swapping but it sounds remarkably like this issue.

        Comment


        • #5
          Originally posted by skeevy420 View Post
          I still do the old school swap partition
          Yup. My standard partitioning scheme is: swap, /home, and one / partition per OS installation. The different OS images share the /home and swap, but that's it. Within each OS image, I use BTRFS subvolumes, instead of partitions.

          Originally posted by skeevy420 View Post
          I like to go a hair larger than my mem size to be safe than sorry.
          I seem to recall some advice, long ago, to make them 2x RAM capacity. I think the idea was that you can't practically use more swap than that, without spending so much time swapping that you're not getting any real work done, anyhow.

          I've got to admit that swap-induced performance throttling has saved me from out-of-memory situations, on a few occasions, where it's bought me enough time to notice that something has a massive memory leak/growth and kill it. So, a large amount of swap is useful for that, too. Of course, that was back in the days of spinning rust drives. On modern NVMe storage, I probably wouldn't even notice the performance hit before it was too late (if at all)! Maybe it'd be a non-issue, with modern OOM-killer behavior,
          Last edited by coder; 02 March 2021, 03:13 PM.

          Comment


          • #6
            At 30 years of age, you'd think linux outta be mature enough to handle a swap file property.

            Comment


            • #7
              Originally posted by ddriver View Post
              At 30 years of age, you'd think linux outta be mature enough to handle a swap file property.
              A pre-release kernel has a bug? Colour me surprised I guess?

              Not a nice one, sure, but if you are running this kind of kernel, it's at your own risk.

              Comment


              • #8
                Originally posted by skeevy420 View Post
                Glad to know that being old school and prudent would have saved my ass
                Yup- I run bleeding-edge kernels too and hadn't seen this, as I have a swap partition as well.

                ... that being said, this is a reminder to back up my NVMe much more often; some other bug could have bitten me.

                Comment


                • #9
                  Originally posted by franglais125 View Post

                  A pre-release kernel has a bug? Colour me surprised I guess?

                  Not a nice one, sure, but if you are running this kind of kernel, it's at your own risk.
                  I wasn't referring to the issue at hand, but at the statements in favor of dedicated swap partitions in general.

                  Comment


                  • #10
                    I go with zram-backed swap and earlyoom. Works perfectly for my needs with 16 gigs of RAM unless something's leaking, in which case something's going to get OOM-killed no matter how much there is.

                    Comment

                    Working...
                    X