I have a Google+ post where I've posted my latest updates:
I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So for all of the kvetching about people not willing to run bleeding edge kernels, please remember that while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.
Is there more testing that we could do? Yes, as a result of this fire drill, I will probably add some systematic power fail testing before I send a pull request to Linus. But please rest assured that we are already doing a lot of QA work as a regular part of the ext4 development process already.
Announcement
Collapse
No announcement yet.
EXT4 Data Corruption Bug Hits Stable Linux Kernels
Collapse
X
-
Originally posted by PuckPoltergeist View PostAnd how do you test for errors you can't reproduce?
This is, to be honest, a somewhat insane thing to do, even though I need to do it in order to reboot reliably due to nested NFS and non-NFS mounts, not all of which may be reachable at umount time. I'm not entirely convinced this is even a bug, though I hope it's a bug because I'm sick of seeing my filesystems corrupted!
It certainly explains why, myself apart, only people using ext4 on removable devices have seen it so far (though anyone making heavy use of umount -l in any context would probably see it soon enough).
Leave a comment:
-
Now I am confused...
[hamish@Griffindor ~]$ yum info kernel
Loaded plugins: langpacks, presto, refresh-packagekit
Available Packages
Name : kernel
Arch : i686
Version : 3.6.2
Release : 1.fc16
Size : 26 M
Repo : updates
Summary : The Linux kernel
URL : http://www.kernel.org/
License : GPLv2
Description : The kernel package contains the Linux kernel (vmlinuz), the core
: of any Linux operating system. The kernel handles the basic
: functions of the operating system: memory allocation, process
: allocation, device input and output, etc.
[hamish@Griffindor ~]$
[root@Griffindor ~]# yum update
Loaded plugins: langpacks, presto, refresh-packagekit
fedora-awesome | 2.8 kB 00:00
fedora-chromium-stable | 3.4 kB 00:00
rpmfusion-free-updates | 3.3 kB 00:00
rpmfusion-nonfree-updates | 3.3 kB 00:00
updates/metalink | 16 kB 00:00
No Packages marked for Update
[root@Griffindor ~]#
[hamish@Griffindor ~]$ uname -r
3.4.11-1.fc16.i686.PAE
[hamish@Griffindor ~]$
I guess this is good, but...
Leave a comment:
-
Originally posted by tehehe View PostThat's why kernel shoud have automatic tests. Code review is important but it's not a substitute to a good test coverage.
Leave a comment:
-
That's why kernel shoud have automatic tests. Code review is important but it's not a substitute to a good test coverage.
Leave a comment:
-
Originally posted by enrico.tagliavini View PostFeel free to help.
If you think you can read and understand in every detail hundred thousand lines of code you can safely replace Linus.
Software has bugs. It is simply impossible to dodge them all. Just think about the notorious random number generator in debian some stable release ago....
Just thank you the openness of linux, will hit only a very small fraction of linux users and most likely geeks and contributors
Leave a comment:
-
Originally posted by necro-lover View Poststable releases are for pussy?s.
I must say I'm very happy with responsiveness here: I first saw fs corruption on Monday, reported it on Tuesday after figuring out that it was definitely 3.6.3 at fault and thus not an already-fixed bug in an old stable kernel, and had a candidate patch from Ted within a few hours, even though I'd dropped this on him without warning and with so little info that he had to dig through every ext*-affecting patch between 3.6.1--3.6.3. I'm sure I couldn't respond to a bug described that vaguely anywhere near that fast. As ever, Ted provides the rest of us with something to aspire to!
Leave a comment:
-
Originally posted by Pallidus View PostLOL is on you because you could be running stable 3.6.1 and be unnafected as well.
PROTIP wait for the kernels to mature, even the stable ones, for at least 15 days before upgrading them PROTIP
"Still, the commit in question *does* change things, and so it's still the most likely culprit."
name and shame plox
stable releases are for pussy?s.
Leave a comment:
-
awww, on stable ? that's evil. last time i lost data on ext4 was on rc-kernels at least ...
glad I'm on btrfs now for all my fs's, although that'll probably blow up any second now :-P
Leave a comment:
Leave a comment: