Announcement

**LinuxAffenMann** · 02 February 2010, 09:35 AM

If you screw up, blame someone else. It's ok, everyone does it once in a while...

**deanjo** · 02 February 2010, 10:11 AM

I think a lot of people here (including the article) are not focusing on the bigger issue here. It's more important to find the reason for the 'hard lock' then it is for the data loss. With 'hard locks' happening on a system no file system is safe. The title could just as easily read "Radeon driver may cause hardlocks resulting in possible data loss" among many others.

As well people saying "xyz filesystem is stable because it works fine here" is really of no use. I notice some were using home servers and power outages as examples. Home servers are probably the least susceptible to data loss even with power outages as their writes are far and few vs their reads and operate in a relatively static scenario. Also I have yet to see any file system that guarantees data loss on a power outage so testimonials on their reliability have to be taken for what they are, personal experiences with no real hard proof of any scenario.

**libv** · 02 February 2010, 10:37 AM

Originally posted by energyman View Post

not only that - all that people who attacked and blocked reiser4 because of 'layer violations' don't have problems with btrfs that does the same but much, much worse.

Heh, typical. This sort of crap happens all the time in the open source community, and they get away with it. People have such short attention/memory spans.

**drag** · 02 February 2010, 10:51 AM

[qoute]
As well people saying "xyz filesystem is stable because it works fine here" is really of no use. I notice some were using home servers and power outages as examples. Home servers are probably the least susceptible to data loss even with power outages as their writes are far and few vs their reads and operate in a relatively static scenario. Also I have yet to see any file system that guarantees data loss on a power outage so testimonials on their reliability have to be taken for what they are, personal experiences with no real hard proof of any scenario.
[/quote]

What is hilarious is that somebody would want to choose XFS over Ext4 when the problem they are experiencing is dataloss on improper shutdown.

If you look at the history and current state of XFS development you'd quickly realize that is like swapping out new tires in your car when the problem is that your engine is constantly exploding into flaming debris.

**deanjo** · 02 February 2010, 11:35 AM

Originally posted by drag View Post

What is hilarious is that somebody would want to choose XFS over Ext4 when the problem they are experiencing is dataloss on improper shutdown.

If you look at the history and current state of XFS development you'd quickly realize that is like swapping out new tires in your car when the problem is that your engine is constantly exploding into flaming debris.

Sure but even usage of XFS does not guarantee loss of data on power loss. With barriers enabled I had a server with a weak powersupply reboot spontaneously over 50 times within a 24 hour period and it was a high usage server with plenty of read / write operations every minute. Just a personal experience.

**starchild** · 02 February 2010, 11:36 AM

Hi Michael,

Long time reader, but this I have to comment on.

Now, this begs a thing I've really missed from all the filesystem tests on Phoronix: reliability. The single most important thing for any disk filesystem is to keep you data safely. I took this for granted more than 5 years ago when I decided to use XFS on my desktop machine, and lost my entire /home after a crash + hard reboot. Some digging revealed that this was quite normal as XFS was designed for UPS-backed computers. After that episode I switched to ext3 which has survived everything ever since.

Really, who cares if a filesystem can create 150 or 200 files per second if one of them is likely to kill your data in case of a power drop or hard reboot?

So, it would be totally sweet with a filesystem robustness test. Even nicer would be if some abuse were part of every filesystem test round, as benchmarks such as yours are one of the reasons some filesystem developers have started to compromise data integrity for performance.

BTW: Love your X.Org coverage :-)

**l8gravely** · 02 February 2010, 12:23 PM

Originally posted by chithanh View Post

Next time store test results remotely, eg. on NFS. That way a software or hardware failure on the test box will not cause loss of the test data.

Absolutely seconded! Why you don't have a NAS box to store all your data on is beyond me. Sure, for individual test runs you want them on a local disk... but core data should be on mirrored and backed up disks.

Belt and suspenders! But I suspect you know all this, and I'm sure you've already run into some of the problems with Ubuntu Lucid with a NFS home directory. It just sucks...

John

**Naib** · 02 February 2010, 03:03 PM

Originally posted by deanjo View Post

I think a lot of people here (including the article) are not focusing on the bigger issue here. It's more important to find the reason for the 'hard lock' then it is for the data loss. With 'hard locks' happening on a system no file system is safe. The title could just as easily read "Radeon driver may cause hardlocks resulting in possible data loss" among many others.

As well people saying "xyz filesystem is stable because it works fine here" is really of no use. I notice some were using home servers and power outages as examples. Home servers are probably the least susceptible to data loss even with power outages as their writes are far and few vs their reads and operate in a relatively static scenario. Also I have yet to see any file system that guarantees data loss on a power outage so testimonials on their reliability have to be taken for what they are, personal experiences with no real hard proof of any scenario.

Yes because it is oh so different from Phoronix idiotic statement about blaming ext4 and using that as an indication that ext4 is bad...

simple fact is something caused a hardlock on their system *THAT* needs to be looked into, likewise more than likely something must have had said fails or the directory being accessed thus when the power button was held down (just to add insult to injury) really fucked up the inode for the directory that live tests were being done into, which shock horror also contained the "backup" of past results...

to many if/buts and unknowns for a straight witchhunt on ext4 HENCE the equally stupid "it works for me" statement

**stargeizer** · 02 February 2010, 03:20 PM

Sigh...

Well, personaly i don't post much often here, but i think some people are missing the main point here: A kernel loockup was the origin of all the problem. In case people don't know, ANY filesystem that make use of heavy caching systems (This is: data is read, maintained in RAM, reported as written, but the real write is done when system is free of work most of the time) will lose data in an lockup/panic event at some point and some time. This is unievitable. The problem is known for every OS out there, and there are many solutions to the problem, some better than others. In the EXT4 case, this filesystem is still WAY prone to these events than EXT3, but less than other high performance filesystems. Is the cost of speed: Do you want a faster filesystem??? you have to trade reliability for speed. There's no other way to deal with it. (Do you want a HARD DISK cheaper, then deal with shot life Hard disks, that are more prone to the "Click of Death" problems, for example.)

In enterprise ambients, you don't use the lastest and greatest: You use the OLDER and SAFER versions of software, since data is critical to many operations. And Backup often, and the most redundantly possible as your resources allows, since NO MEDIUM is free of data corruption.

Is a pity that the data of the PTS was lost in these events, but since you are testing the lastest and greatest, the bugs that come from such systems are also "greastest". That's how development of software is done in this times: You release something buggy, you fix it later with a patch or service pack, or whatever is called fix. And sadly, this has become the norm in every development area, be open source or closed source.

Times are changing, somebody says long time ago.

As usual (well.. not usual here

) my viewpoints are from my experiences and as usual in this funny world: Your Mileage May Vary.

Best Regards.

**Davidovitch** · 02 February 2010, 06:15 PM

I am far from an expert on the matter, however I do agree that it sounds a little odd that no external backup was present at the time of the crash. But what is exactly the purpose of some of the bashing concerning the non-backups? I do assume the testers had there reasons, since they are clearly no idiots. It would be more interesting to have a polite query regarding the reasoning behind that.
If the discussion starts with the assumption that only idiots don't create backups on external-mirrored-co-located-ups'ed drives, chances are slim an interesting interaction between whoever is involved can get started... As far as I am concerned, reading a forum thread is out of pure interest in the subject and I hope to learn from others. I hope I do not offend anybody, but bashing around is a pure waste of time, for everybody involved...</grandpa>

Nevertheless, I think one of the conclusions of this failure is already great: the remote syncing option in Phoromatic.

Announcement

EXT4 Lets Us Down, There Goes Our R600/700 Mesa Tests

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment