EXT4 Lets Us Down, There Goes Our R600/700 Mesa Tests

phtpht replied

18 February 2010, 01:17 AM
Originally posted by decep View Post

This may help in the future. Assuming the kernel has not completely hard locked (and maybe even so), you can force filesystems to sync and remount read-only.

http://en.wikipedia.org/wiki/Magic_SysRq_key

Yeah. Just make sure it works well before it's needed, as some distros are stupid enough to disable it for you.

That is, see if /proc/sys/kernel/sysrq contains a 1 (and try it). Also, you can invoke the functions without the keyboard by echoing the character to /proc/sysrq-trigger. Finally, IIRC there is an iptables module that can convert incoming magic packets into magic sysrq.
Leave a comment:
decep replied

17 February 2010, 05:29 PM
Magic SysReq keys

This may help in the future. Assuming the kernel has not completely hard locked (and maybe even so), you can force filesystems to sync and remount read-only.

Magic SysRq key - Wikipedia

http://en.wikipedia.org/wiki/Magic_SysRq_key
Leave a comment:
phtpht replied

05 February 2010, 11:48 AM
Originally posted by energyman View Post

ext4 is broken.

I'd say immature.
Leave a comment:
DanL replied

05 February 2010, 06:30 AM
Is ZFS use more widespread nowadays? I used to do tech support for Sun's storage division (left in 2007) and it was very rare to get calls about ZFS, even after being trained for it. Maybe it was too new back then, or maybe those customers weren't calling for a reason? :P
Leave a comment:
energyman replied

05 February 2010, 05:23 AM
so, what you guys are saying is:
ext3 is a clusterfuck
ext4 is broken.

welcome to my world.
Leave a comment:
phtpht replied

05 February 2010, 02:23 AM
Originally posted by movieman View Post

Assuming you're not mad enough to run a hardware RAID controller without battery backup

There's not that much danger in that, now that we have filesystems with the concept of write barriers. As I said, most consumer drives have w/b caches and obviously no tiny batteries.

And if you're running ext3, that only really matters if the writes were reordered by the drive. If you're running a file system like ext4 which believes it can write random data to random places at random times, well, you're toast.

Where does sync magically avoid data loss from the disk cache on a power failure? The only thing it guarantees is that the filesystem tries to write the data to the disk; there's no guarantee that it actually gets there, if the system crashes while the sync is in progress. And if the filesystem writes in a random order there's no guarantee that whatever part of the file does get to disk before a crash will be valid.

Seriously, you're demanding that programmers return to the stone age of computing where they had to worry all the time about what the hardware was doing underneath them; you might as well demand they make low-level BIOS calls to write files to disk or write their own raw I/O routines and interrupt handlers to read them back.

You got it upside down. The concept of block device caches is well known and well documented throughout the (for example POSIX) API. The contract for write() is that it will SOMEHOW make notion that this and that should go here and there and all it guarantees is that subsequent read() returns that data; on the other hand, fsync() specification says that it will WAIT until the data, the metadata and the directory entry are reported by the DEVICE to have been written on a stable storage. If your OS or device behaves differently then it's BROKEN and all bets are off.

This is what I expect the programmers to acknowledge and work with, nothing more, nothing less. It's an abstraction that shields you from the actual implementation, either if the file system writes to disk at once or if on the other hand the data takes a round trip through the solar system.

However, the concept of EXT3 "hoping that data won't be reordered" is the exact opposite. You're ASSUMING certain geometry and behavior of the drives that may or may not hold. EXT4 actually steps down from this in the concept that a disk is a random write device with details unknown.

No, ext3 _IS_ more reliable, at least by default. That is a simple fact: the default configuration for ext3 on pretty much all distributions is set for reliability over performance, which is what the vast majority of users want for a general purpose filesystem.

Oh yes, that's why barriers were initially OFF for ext3 and that's why some distro's (ubuntu) maintain that tradition even when the default has changed after a lenghty debate akin to the one we have.

So now people are being told 'dump ext3, which reliably stores your data in 99.999% of cases and replace it with ext4 which will happily corrupt it if your application doesn't use a transactional database model'. And you're surprised that people aren't rushing towards the brave new future of random data loss or lousy performance?

They are told "leave your illusions and welcome to the real world". And in the spirit of freedom they always can revert to their old ways.

In other words, "buy an UPS and backup your data, morons".

A general purpose filesystem exists to reliably store user data on the disk. If a supposed general purpose filesystem deletes my bookmarks when a game crashes, then the filesystem is broken.

You've reversed the cause and consequence. Firefuck not syncing its bookmarks is the result of a broken filesystem the majority used. Again an abstration that leaked: if ff would adhere to the standards, people would whine that their games run slow, because ext3 would also sync stuff from the game.

That's not to say that such a filesystem doesn't have other uses where reliable data storage is less important than performance, but it certainly should not be pushed for general purpose use like storing user home directories.

Home dirs are general purpose? C'mon.

Or is ZFS for 'sissies' too?

I don't know yet. Enlighten me.
Leave a comment:
movieman replied

03 February 2010, 07:17 PM
Originally posted by phtpht View Post

Not true. Since the dawn of times drives have used in-memory "write-back" caches. Your operating system has one, your RAID controller has one and hell even most modern consumer drives have one

Assuming you're not mad enough to run a hardware RAID controller without battery backup, the only one of those which could lose your files on a typical ext3 configuration is the disk cache. Which will flush itself very quickly, so it's really only an issue in a sudden power failure combined with a file system sync combined with a drive which reorders writes so that the metadata is updated before the file data, combined with the power going out before the write is complete... in other words, almost never. I guess you could get a partially written block, but I've never seen one in all the power failures I've had in all the computers I've used, so that seems very unlikely.

Ext4, on the other hand, will lose or corrupt your data in normal operation if the system crashes or power fails, because it doesn't have all the reliability features that ext3 eventually added precisely because an unreliable filesystem is useless as a general purpose filesystem: most people would rather read valid data slowly than corrupt data fast.

So what do you think that happens when you cut off power while the data is still in cache but not on disk? Data lost.

And if you're running ext3, that only really matters if the writes were reordered by the drive. If you're running a file system like ext4 which believes it can write random data to random places at random times, well, you're toast.

If you want to guarantee your data on the disk, you will use sync, period.

Where does sync magically avoid data loss from the disk cache on a power failure? The only thing it guarantees is that the filesystem tries to write the data to the disk; there's no guarantee that it actually gets there, if the system crashes while the sync is in progress. And if the filesystem writes in a random order there's no guarantee that whatever part of the file does get to disk before a crash will be valid.

Back in the real world, any general-purpose filesystem which expects every application to call sync every time it writes data to the disk which it doesn't want corrupted is simply broken. Firstly because at least 90% of applications don't call sync and won't be updated to do so within the next decade, secondly because sync is slow and unnecessary on the most common current Linux filesystem so you're now expecting application developers to check the filesystem they're writing to in order to determine whether or not they should bother to sync after each write if they don't want to cripple performance, and thirdly because I strongly suspect that 90% of the 10% of applications which do sync don't sync properly (e.g. syncing the directory as well as the file, when that's required).

Worse than that, most applications that write to the disk do actually expect their data to get there, and even those which don't have to call sync if they don't want the file corrupted because the unsynced filesystem wrote out the metadata before it crashed but not the actual file data. So that means calling sync all the time and crippling performance, all in the name of supporting a cache which exists solely to improve performance: aka 'we had to kill our performance in order to save it'.

Seriously, you're demanding that programmers return to the stone age of computing where they had to worry all the time about what the hardware was doing underneath them; you might as well demand they make low-level BIOS calls to write files to disk or write their own raw I/O routines and interrupt handlers to read them back.

I honestly don't know of a better way to point out what a stupid idea this is.

But that's also why each sync takes sooooooo looooooong on ext3 and that's why people have GIVEN UP on doing that and that's why IT SEEMS that ext3 is more reliable (unless you use Ubuntu which uses unsafe defaults).

No, ext3 _IS_ more reliable, at least by default. That is a simple fact: the default configuration for ext3 on pretty much all distributions is set for reliability over performance, which is what the vast majority of users want for a general purpose filesystem.

So now people are being told 'dump ext3, which reliably stores your data in 99.999% of cases and replace it with ext4 which will happily corrupt it if your application doesn't use a transactional database model'. And you're surprised that people aren't rushing towards the brave new future of random data loss or lousy performance?

But because of EXT3 history, people forget to goddamn sync their data when they want them on the disk. Then they whine that if their game freezes they lose their goddamn bookmarks with EXT4.

A general purpose filesystem exists to reliably store user data on the disk. If a supposed general purpose filesystem deletes my bookmarks when a game crashes, then the filesystem is broken. I don't care that you can save my game 0.1 seconds faster than ext3 if you delete my bookmarks when the game crashes, and I would note that Mozilla took an age to fix Windows deleting bookmarks from NTFS after a crash, and the consequent move to sqlite instead of a simple HTML file is probably responsible for its lousy performance on ext3 because sqlite syncs all the time _even though syncs are not required for that application on ext3_.

That's not to say that such a filesystem doesn't have other uses where reliable data storage is less important than performance, but it certainly should not be pushed for general purpose use like storing user home directories.

So you want to either use a sissy filesystem like EXT3 and forget about my lecture here or you use a real one and learn how to use it properly.

A real filesystem like ZFS, for example, which is light years ahead of ext4 technically and suffers from none of its data corruption problems so you can actually trust that your data will still be there when you go to read it.

Or is ZFS for 'sissies' too?
Leave a comment:
phtpht replied

03 February 2010, 03:50 PM
Originally posted by energyman View Post

XFS has been critized for its bad behaviour on crashs. There was even the rule that with XFS you must use a PSU or be prepared for catastrophal data loss.
Everybody ridiculed it for that behaviour.

And ext4 copied that crap. Zero lenght files. Too much data on the fly. Everything that is bad about XFS got copied.

Urban legends. All XFS does it truncates files that would on other FS's have bullshit written into them. The idea being that you're more likely to notice the former.

You don't need a PSU. You need to backup your data while you have it all right. And that's a general rule with any file system.

And it has precisely nothing to do with cache usage. If you're complaining about too much data in the cache you can always shrink it or disable it totally. You have ample options and knobs to do whatever you please. Using EXT3 that ignores most of them does at most bring false sense of being in control and secure.

EXT3 is a pathetic caricature, a mild improvement over EXT2, but nothing more, it was brought to this world to make countless EXT2 disks migrate easier to journaling but it took its toll. Every time I did some real work on EXT3 it took freaking more time than on XFS or even reiser; yet EXT3 has been the frigging default for almost all distro's.

You all EXT3'ers have been actually using EXT2 for all the time. (EXT2 is not bad, but it's simple and plain and has its limits.) Now that EXT4 brings real changes and features and pushes finally the default configuration to be on par with other FS's you either mindlessly whine or turn all the cool stuff off effectively retarding it to EXT2 again.

You want a fs that cares about your data?

[ 269.960844] reiser4[postdrop(3406)]: disable_write_barrier (fs/reiser4/wander.c:235)[zam-1055]:
[ 269.960849] NOTICE: md1 does not support write barriers, using synchronous write instead.

user reiser4

If your block driver doesn't support write barriers then you're screwed. You can not hope to have a functioning journaling FS without barriers.

You can always mount almost any system with -o sync to achieve the abovementioned effect. Your write performance will suffer horribly. The old Zaurus handhelds had synchronous writes hardcoded in the kernel. By the time it copied your photos onto a USB flash you could make a ragu for dinner.
Leave a comment:
energyman replied

03 February 2010, 03:15 PM
XFS has been critized for its bad behaviour on crashs. There was even the rule that with XFS you must use a PSU or be prepared for catastrophal data loss.
Everybody ridiculed it for that behaviour.

And ext4 copied that crap. Zero lenght files. Too much data on the fly. Everything that is bad about XFS got copied.

You want a fs that cares about your data?

[ 269.960844] reiser4[postdrop(3406)]: disable_write_barrier (fs/reiser4/wander.c:235)[zam-1055]:
[ 269.960849] NOTICE: md1 does not support write barriers, using synchronous write instead.

user reiser4
Leave a comment:
phtpht replied

03 February 2010, 02:57 PM
Originally posted by movieman View Post

Sensible filesystems don't require you to sync if you want your data to actually be written to the disk in a safe manner, so if ext4 doesn't do that, it's simply broken as a general-use filesystem.

More to the point, the only reason to use ext4 is performance, and if you're now going to have to sync every time you write anything to disk, your performance is going to be worse than a reliable filesystem like ext3. So what's the point?

Not true. Since the dawn of times drives have used in-memory "write-back" caches. Your operating system has one, your RAID controller has one and hell even most modern consumer drives have one:

Code:

$ cat /sys/block/sda/device/model WDC WD6401AALS-0 $ cat /sys/block/sda/device/scsi_disk\:0\:0\:0\:0/cache_type write back

So what do you think that happens when you cut off power while the data is still in cache but not on disk? Data lost.
When and how often a cache flushes its contents to the drive depends solely on its logic. If you want to guarantee your data on the disk, you will use sync, period. Transactional databases do that on end of each transaction and every sane person does that as well. Note that this has NOTHING to do with the file system used.

However, on that matter. Some file systems are more jerky wrt cache and some are more intelligent. EXT3 is more jerky. Normally you can ask a filesystem to synchronize the contents of one file (or a portion of) to the disk. Not with ext3. It can't flush the contents of a single file; when you call sync, it flushes everything instead. Therefore it's more likely that your data gets on the disk eventually. But that's also why each sync takes sooooooo looooooong on ext3 and that's why people have GIVEN UP on doing that and that's why IT SEEMS that ext3 is more reliable (unless you use Ubuntu which uses unsafe defaults).

EXT4 is more mature in this manner. That means it works with the cache as good as the other filesystems like XFS or JFS have worked for decades now. But because of EXT3 history, people forget to goddamn sync their data when they want them on the disk. Then they whine that if their game freezes they lose their goddamn bookmarks with EXT4.

So you want to either use a sissy filesystem like EXT3 and forget about my lecture here or you use a real one and learn how to use it properly. The balance between reliable and fast is more in your hands, you control when and how the cache gets flushed. If you do it right you will gain MORE performance as opposed to your opinion.

And by the way, the article says that they also lost data which were already on disk and which they did not touch for some time. That means they either hit a filesystem bug (it's immature, but not weird in its cache usage); or they screwed up the hardware; or they use some insecure mount option (it's ubuntu you know).
Leave a comment:

Announcement

EXT4 Lets Us Down, There Goes Our R600/700 Mesa Tests

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: