Announcement

Collapse
No announcement yet.

EXT4 Data Corruption Bug Hits Stable Linux Kernels

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Pallidus View Post
    LOL is on you because you could be running stable 3.6.1 and be unnafected as well.


    PROTIP wait for the kernels to mature, even the stable ones, for at least 15 days before upgrading them PROTIP


    "Still, the commit in question *does* change things, and so it's still the most likely culprit."


    name and shame plox
    I only use RC/Beta/Alpha kernels for years and most of the time Alpfa/Beta/Git Readeon driver as well.

    stable releases are for pussy?s.

    Comment


    • #32
      Originally posted by necro-lover View Post
      stable releases are for pussy?s.
      Stable releases are for people doing serious work who need their systems to function without chasing down kernel bugs all the time. I spent too much time on this as it is (though my employer has a direct interest in the stability of the Linux kernel, so nobody was too unhappy).

      I must say I'm very happy with responsiveness here: I first saw fs corruption on Monday, reported it on Tuesday after figuring out that it was definitely 3.6.3 at fault and thus not an already-fixed bug in an old stable kernel, and had a candidate patch from Ted within a few hours, even though I'd dropped this on him without warning and with so little info that he had to dig through every ext*-affecting patch between 3.6.1--3.6.3. I'm sure I couldn't respond to a bug described that vaguely anywhere near that fast. As ever, Ted provides the rest of us with something to aspire to!

      Comment


      • #33
        Originally posted by enrico.tagliavini View Post
        Feel free to help.
        Why do *I* have to "feel free" when RedHat is paying its testers perfectly good money?

        If you think you can read and understand in every detail hundred thousand lines of code you can safely replace Linus.
        Now you say one has to be a gnarly kernel hacker to help! Which is it?

        Software has bugs. It is simply impossible to dodge them all. Just think about the notorious random number generator in debian some stable release ago....
        Yes indeed I think all the time about the lack of even the most rudimentary sorts of regression testing.

        Just thank you the openness of linux, will hit only a very small fraction of linux users and most likely geeks and contributors
        Oh you mean only the people who are using the latest releases of the two most mainstream distributions?

        Comment


        • #34
          That's why kernel shoud have automatic tests. Code review is important but it's not a substitute to a good test coverage.

          Comment


          • #35
            Again, you can't spot such a bug with automatic tests.
            ## VGA ##
            AMD: X1950XTX, HD3870, HD5870
            Intel: GMA45, HD3000 (Core i5 2500K)

            Comment


            • #36
              Originally posted by tehehe View Post
              That's why kernel shoud have automatic tests. Code review is important but it's not a substitute to a good test coverage.
              And how do you test for errors you can't reproduce?

              Comment


              • #37
                Now I am confused...

                [[email protected] ~]$ yum info kernel
                Loaded plugins: langpacks, presto, refresh-packagekit
                Available Packages
                Name : kernel
                Arch : i686
                Version : 3.6.2
                Release : 1.fc16
                Size : 26 M
                Repo : updates
                Summary : The Linux kernel
                URL : http://www.kernel.org/
                License : GPLv2
                Description : The kernel package contains the Linux kernel (vmlinuz), the core
                : of any Linux operating system. The kernel handles the basic
                : functions of the operating system: memory allocation, process
                : allocation, device input and output, etc.

                [[email protected] ~]$

                [[email protected] ~]# yum update
                Loaded plugins: langpacks, presto, refresh-packagekit
                fedora-awesome | 2.8 kB 00:00
                fedora-chromium-stable | 3.4 kB 00:00
                rpmfusion-free-updates | 3.3 kB 00:00
                rpmfusion-nonfree-updates | 3.3 kB 00:00
                updates/metalink | 16 kB 00:00
                No Packages marked for Update
                [[email protected] ~]#

                [[email protected] ~]$ uname -r
                3.4.11-1.fc16.i686.PAE
                [[email protected] ~]$

                I guess this is good, but...

                Comment


                • #38
                  Originally posted by PuckPoltergeist View Post
                  And how do you test for errors you can't reproduce?
                  Quite. My latest tests suggest that you have to reboot *while a umount is in progress* for this to go wrong -- and that this affects Linux 3.6.1 and quite possibly many earlier versions (untested as yet), though the dangerous race window is much narrower in kernels before 3.6.2 or 3.6.3 and you pretty much have to do the umount and then the reboot -f as the very next command to make it go wrong. It is not plausible that anyone would have thought of testing *that* before I ran into it. But my home server is a test platform that does just that!

                  This is, to be honest, a somewhat insane thing to do, even though I need to do it in order to reboot reliably due to nested NFS and non-NFS mounts, not all of which may be reachable at umount time. I'm not entirely convinced this is even a bug, though I hope it's a bug because I'm sick of seeing my filesystems corrupted!

                  It certainly explains why, myself apart, only people using ext4 on removable devices have seen it so far (though anyone making heavy use of umount -l in any context would probably see it soon enough).

                  Comment


                  • #39
                    I have a Google+ post where I've posted my latest updates:

                    https://plus.google.com/117091380454...ts/Wcc5tMiCgq7

                    I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So for all of the kvetching about people not willing to run bleeding edge kernels, please remember that while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.

                    Is there more testing that we could do? Yes, as a result of this fire drill, I will probably add some systematic power fail testing before I send a pull request to Linus. But please rest assured that we are already doing a lot of QA work as a regular part of the ext4 development process already.

                    Comment


                    • #40
                      Originally posted by NullNix View Post
                      Stable releases are for people doing serious work who need their systems to function without chasing down kernel bugs all the time. I spent too much time on this as it is (though my employer has a direct interest in the stability of the Linux kernel, so nobody was too unhappy).

                      I must say I'm very happy with responsiveness here: I first saw fs corruption on Monday, reported it on Tuesday after figuring out that it was definitely 3.6.3 at fault and thus not an already-fixed bug in an old stable kernel, and had a candidate patch from Ted within a few hours, even though I'd dropped this on him without warning and with so little info that he had to dig through every ext*-affecting patch between 3.6.1--3.6.3. I'm sure I couldn't respond to a bug described that vaguely anywhere near that fast. As ever, Ted provides the rest of us with something to aspire to!
                      be happy that people like me use the latest beta/alpha stuff to trash there system because I also report bugs if i find one

                      Comment

                      Working...
                      X