Announcement

**jwilliams** · 20 May 2013, 06:43 AM

I thought it was certain types of reads that are typically slower with COW file systems.

For example, assume you have three large files that are stored mostly sequentially on your drive, and then you open the three files and do various random writes across the three files. With a conventional filesystem, the writes will indeed be random, overwriting old data. With a COW filesystem, the writes will be mostly sequential (in the best case for the COW filesystem), but to a different part of the drive. So the COW filesystem may potentially be faster on the random writes (or not, if it has to do a read-modify-write operation, or if it has to write more data to the drive due to other COW features).

When you go to read back the three formerly sequential files, they will still be sequential with the conventional filesystem. But with the COW filesystem (assuming auto-defrag has not run yet), the reads will have a random component since some blocks of the files have been relocated to another part of the drive, and that will slow the reads down. This is why auto-defrag is important for btrfs. But if your filesystem is under heavy load, either auto-defrag won't run, or it will load the system down further. And I think btrfs auto-defrag only defragments smallish files. If you have a very large file, you are out of luck (which is one reason why btrfs users recommend turning off COW for large database or VM files).

Look at how badly btrfs does on the fio test, fileserver access pattern, which is mostly random reads. btrfs is six times slower than ext4. I wonder if reducing read_ahead_kb in the kernel would help the btrfs performance in random reads. (Incidentally, phoronix needs to update fio...version 1.57 is nearly two years old...fio is at 2.1 now)

I believe ZFS (partially) solves this problem through caches like L2ARC. And that is one of the reasons why ZFS has a reputation as a memory hog. But I suspect even ZFS would beat btrfs in most of the benchmarks in this article. It is a shame ZFS was not included.

Here is an interesting article that lists workarounds ("tuning") that you can try to optimize Oracle to work with ZFS:

http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases

**JS987** · 22 May 2013, 09:04 AM

Why only SSD is tested? Most SSDs are unreliable in case of power loss without UPS.

Another component in higher performing SSDs is a capacitor or some form of battery. These are necessary to maintain data integrity such that the data in the cache can be flushed to the drive when power is dropped; some may even hold power long enough to maintain data in the cache until power is resumed. In the case of MLC flash memory, a problem called lower page corruption can occur when MLC flash memory loses power while programming an upper page. The result is that data written previously and presumed safe can be corrupted if the memory is not supported by a super capacitor in the event of a sudden power loss. This problem does not exist with SLC flash memory.[36] Most consumer-class SSDs do not have built-in batteries or capacitors;[51] among the exceptions are the Crucial M500 series,[52] the Intel 320 series[53] and the more expensive Intel 710 series.

Solid-state drive - Wikipedia

http://en.wikipedia.org/wiki/Solid-state_drive

Announcement

Btrfs vs. EXT4 vs. XFS vs. F2FS On Linux 3.10

Comment

Comment