EXT4 Lets Us Down, There Goes Our R600/700 Mesa Tests

movieman replied

03 February 2010, 03:31 AM
Originally posted by phtpht View Post

But yeah right, blame ext4.

Sensible filesystems don't require you to sync if you want your data to actually be written to the disk in a safe manner, so if ext4 doesn't do that, it's simply broken as a general-use filesystem.

More to the point, the only reason to use ext4 is performance, and if you're now going to have to sync every time you write anything to disk, your performance is going to be worse than a reliable filesystem like ext3. So what's the point?

I've been running ext4 on my netbook because it's now the default for Ubuntu, and so far I'm amazed that nothing bad has happened after the times I've had to do a hard power down due to NFS lockups. I'm still wondering what may have been silently corrupted without my knowledge.
Leave a comment:
phtpht replied

03 February 2010, 01:52 AM
Learn to 'sync' when you want your data safe. File systems and block devices are at liberty to do anything but write the data until you 'sync'. But yeah right, blame ext4.
Leave a comment:
deanjo replied

02 February 2010, 11:55 PM
Originally posted by energyman View Post

backup only reduces the chance of catastrophic loss. But you can never reach 0.

Yup chances are that a simple rsync to a usb keychain after each test ran would have been just as good as any of the other presented solutions in the long run.
Leave a comment:
energyman replied

02 February 2010, 11:53 PM
yeah. Or the enlightening moment you realize that your backup solution has been writing garbage to the medium for the last couple of month. Or when the controller you want to use to replay the backup fries the disk.

backup only reduces the chance of catastrophic loss. But you can never reach 0.
Leave a comment:
deanjo replied

02 February 2010, 11:42 PM
Originally posted by energyman View Post

yeah yeah, backups. Some of you do life backups of everything at once. Right?

Fact is, data loss always appear at the most inconvinient point. Another fact is that ext4 is utterly broken.

No backups can help you with that.

Lets not forget even the most prudent "backup" person has at some point had a moment of laziness at which Murphy's Law will kick into play.
Leave a comment:
SunnyDrake replied

02 February 2010, 11:23 PM
From times when i used smartdrive tool in dos to make drive writes @virtual@ this became play of tradeoff between performance/reability.
This topic is quite interesting as it shows weak point in fs's that phoronix avoided in articles. It will be nice to see new article where fs's data loss/corruptions tested based on kernel lockup,powercut,real data write under heavy cpu load, memory usage, cpu usage, performance,amount of hardware write calls in stock settings and fail-safe settings.
Leave a comment:
energyman replied

02 February 2010, 07:43 PM
yeah yeah, backups. Some of you do life backups of everything at once. Right?

Fact is, data loss always appear at the most inconvinient point. Another fact is that ext4 is utterly broken.

No backups can help you with that.
Leave a comment:
Davidovitch replied

02 February 2010, 06:15 PM
I am far from an expert on the matter, however I do agree that it sounds a little odd that no external backup was present at the time of the crash. But what is exactly the purpose of some of the bashing concerning the non-backups? I do assume the testers had there reasons, since they are clearly no idiots. It would be more interesting to have a polite query regarding the reasoning behind that.
If the discussion starts with the assumption that only idiots don't create backups on external-mirrored-co-located-ups'ed drives, chances are slim an interesting interaction between whoever is involved can get started... As far as I am concerned, reading a forum thread is out of pure interest in the subject and I hope to learn from others. I hope I do not offend anybody, but bashing around is a pure waste of time, for everybody involved...</grandpa>

Nevertheless, I think one of the conclusions of this failure is already great: the remote syncing option in Phoromatic.
Leave a comment:
stargeizer replied

02 February 2010, 03:20 PM
Sigh...

Well, personaly i don't post much often here, but i think some people are missing the main point here: A kernel loockup was the origin of all the problem. In case people don't know, ANY filesystem that make use of heavy caching systems (This is: data is read, maintained in RAM, reported as written, but the real write is done when system is free of work most of the time) will lose data in an lockup/panic event at some point and some time. This is unievitable. The problem is known for every OS out there, and there are many solutions to the problem, some better than others. In the EXT4 case, this filesystem is still WAY prone to these events than EXT3, but less than other high performance filesystems. Is the cost of speed: Do you want a faster filesystem??? you have to trade reliability for speed. There's no other way to deal with it. (Do you want a HARD DISK cheaper, then deal with shot life Hard disks, that are more prone to the "Click of Death" problems, for example.)

In enterprise ambients, you don't use the lastest and greatest: You use the OLDER and SAFER versions of software, since data is critical to many operations. And Backup often, and the most redundantly possible as your resources allows, since NO MEDIUM is free of data corruption.

Is a pity that the data of the PTS was lost in these events, but since you are testing the lastest and greatest, the bugs that come from such systems are also "greastest". That's how development of software is done in this times: You release something buggy, you fix it later with a patch or service pack, or whatever is called fix. And sadly, this has become the norm in every development area, be open source or closed source.

Times are changing, somebody says long time ago.

As usual (well.. not usual here ) my viewpoints are from my experiences and as usual in this funny world: Your Mileage May Vary.

Best Regards.
Leave a comment:
Naib replied

02 February 2010, 03:03 PM
Originally posted by deanjo View Post

I think a lot of people here (including the article) are not focusing on the bigger issue here. It's more important to find the reason for the 'hard lock' then it is for the data loss. With 'hard locks' happening on a system no file system is safe. The title could just as easily read "Radeon driver may cause hardlocks resulting in possible data loss" among many others.

As well people saying "xyz filesystem is stable because it works fine here" is really of no use. I notice some were using home servers and power outages as examples. Home servers are probably the least susceptible to data loss even with power outages as their writes are far and few vs their reads and operate in a relatively static scenario. Also I have yet to see any file system that guarantees data loss on a power outage so testimonials on their reliability have to be taken for what they are, personal experiences with no real hard proof of any scenario.

Yes because it is oh so different from Phoronix idiotic statement about blaming ext4 and using that as an indication that ext4 is bad...

simple fact is something caused a hardlock on their system *THAT* needs to be looked into, likewise more than likely something must have had said fails or the directory being accessed thus when the power button was held down (just to add insult to injury) really fucked up the inode for the directory that live tests were being done into, which shock horror also contained the "backup" of past results...

to many if/buts and unknowns for a straight witchhunt on ext4 HENCE the equally stupid "it works for me" statement
Leave a comment:

Announcement

EXT4 Lets Us Down, There Goes Our R600/700 Mesa Tests

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: