Announcement

**hartz** · 29 April 2013, 05:48 AM

Ext4-LVM stack Tuning

I suspect that Ext4 could perform much better with even just basic tuning.

In particular the fact that Ext4 on a ZVol outperforms Ext4 on LVM+mdadm is probably because ZFS is auto-tuning every single write in the background.

I do not want to be guilty of my own criticism of many performance test conclusions. Defaults are only a valid "test" if you reason "I don't care how it could perform, only how it performs by default".

I strongly suspect that the mis-match between Write-sizes, Ext4-extend size, LVM allication size and md-raid stripe width is to blame for the bad results seen from the Ext4+LVM stack.

Arguments for testing defaults:
1. Most people don't tune
2. Tuning is too difficult / don't know when to stop tuning
3. Tuning favours one particular workload and is not representative of what other people will experience
4. It should be near-enough to optimal out of the box
5. You need to be a performance expert to do it correctly.
6. Tuning will hurt performance later on when the usage patterns change

My issue with all of this is simple: The conclusions that are made are invalid BECUASE default settings were used for testing. And for no other reason than because the test objective was never properly defined. The conclusion have to follow out of the test objective. The test platform must support evaluating for the objective. The evaluation must have a benchmark for success/failure. Otherwise quite simply you don't know what you are doing.

And then this is all supported by sensationalism. Readers want to get a warm fuzzy feel that they have "the best". So the authors feeding of sensation tend to come to simple conlusions (tm) that A is better than B because A is faster than B.

But they did not test properly ... A is (possibly) only faster than B when tested in that specific way. Faster is not a synonym for better, or else I'd be testing xfs, btrfs, etc.

So put down your test objective (and scope). Write out your expected outcome. Define your test methodology. Design your test platform. Run your tests and record the results. Evaluate your results to check whether they are valid (in particular consider what system resources were the bottlenecks for each test). And then finally make a proper conclusion.

Question: Did I do proper testing above? No, not yet. I have actually just been playing with a test tool (Phoronix test suite) to see how well it supports some of my theories. Ha ha ha. I'll try to do it right when I am on the target platform, probably still some time this week.

**kebabbert** · 01 May 2013, 08:23 PM

It is well known that ZFS is built for Enteprise settings, with many disks and large raids on heavy servers. It it built to scale. For instance, IBM Sequioa supercomputer uses Lustre on top ZFS, and stores 55 PB disk, and achieves 1TB/sec bandwidth. ZFS does not perform well on single disk settings. It is not really a desktop filesystem.

Ted Tso, ext4 creator, explained that ext4 is best with few disks, i.e. it is designed for desktops. Not large servers. Ted Tso, himself, does not have access to large servers with 100s of disks, so how could he optimize ext4 to large storage servers? He has a fast desktop PC, and some disks and can test and optimize and debug for that scenario. If he was given large servers with 100s of disks, then he could tailor ext4 to those servers.

However, it would be interesting to see how ZFS L2ARC and ZIL changes everything. To use SSDs as cache is an Enterprise feature and really boosts ZFS performance. For ZIL you should use a very low latency SSD, preferably ZeusRAM which is battery backed up RAM. And for L2ARC, use SSD with low latency. This setup can boost performance into 100.000s of IOPS!!! But, ZFS shines with Enterprise hardware, SSDs, large RAM, etc. Nothing can touch ZFS in that case.

**hartz** · 02 May 2013, 05:36 AM

Judging based on performance test results

Originally posted by kebabbert View Post

It is well known that ZFS is built for Enteprise settings, with many disks and large raids on heavy servers. It it built to scale. For instance, IBM Sequioa supercomputer uses Lustre on top ZFS, and stores 55 PB disk, and achieves 1TB/sec bandwidth.

ZFS is built for enterprise type "demands", not only for enterprise environments. The problem is that articles such as the ones published here on phoronix tend to judge ZFS based on performance results obtained by testing on single-disk systems. I decided to play the performance testing game and by giving the host enough dedicated disk drives turned the tables.

Performance testing does not equate suitability testing.

Originally posted by kebabbert View Post

ZFS does not perform well on single disk settings. It is not really a desktop filesystem.

That doesn't stop people from publishing results showing that ZFS is inferior to other systems by testing it on single-disk platforms. ZFS is THE file system for any system that needs its features [1], [2] (pdf), [3], [4] (pdf), Desktop or otherwise.

Originally posted by kebabbert View Post

Ted Tso, ext4 creator, explained that ext4 is best with few disks, i.e. it is designed for desktops. Not large servers. Ted Tso, himself, does not have access to large servers with 100s of disks, so how could he optimize ext4 to large storage servers? He has a fast desktop PC, and some disks and can test and optimize and debug for that scenario. If he was given large servers with 100s of disks, then he could tailor ext4 to those servers.

I think what Hardware the creator of Ext4 has access to is irrelevant: I did not test ZFS vs Ext4, I tested an Ext4-on-LVM-on-mdadm stack against ZFS. And LVM / mdadm needs to deal with the number of disks and IO routing, not Ext4.

Part of my gripe is about how testing "defaults" are really irrelevant. Ext4 can be (manually) optimized with stride and stripe-width tuning, while ZFS auto-tunes because the stack is integrated.

Originally posted by kebabbert View Post

However, it would be interesting to see how ZFS L2ARC and ZIL changes everything. To use SSDs as cache is an Enterprise feature and really boosts ZFS performance. For ZIL you should use a very low latency SSD, preferably ZeusRAM which is battery backed up RAM. And for L2ARC, use SSD with low latency. This setup can boost performance into 100.000s of IOPS!!! But, ZFS shines with Enterprise hardware, SSDs, large RAM, etc. Nothing can touch ZFS in that case.

I want to respond that it is well known that giving ZFS dedicated caching hardware will make it run circles around anything else out there. But if I had tested that way my results would have been immediately and automatically dismissed as "relevant only to people with access to that kind of hardware".

The tests were there to show that ZFS should not be judged by results of just one test configuration. My next test setup will be using a consumer grade motherboard with a hand full of SATA hard drives.

**kebabbert** · 02 May 2013, 06:02 AM

Originally posted by hartz View Post

I want to respond that it is well known that giving ZFS dedicated caching hardware will make it run circles around anything else out there. But if I had tested that way my results would have been immediately and automatically dismissed as "relevant only to people with access to that kind of hardware".

True. People will always find something they can criticize, either the RAM amount was different, or the brand, or the electric current, or the mouse, or....

The tests were there to show that ZFS should not be judged by results of just one test configuration. My next test setup will be using a consumer grade motherboard with a hand full of SATA hard drives.

Please do! The more disks, the better ZFS scales, while ext4 does not. If you can, use LSI2008 cards, for instance the very cheap IBM M1015 which is a rebranded LSI 2008 card. It costs $80 on Ebay. On FreeBSD and Solarish distros, LSI2008 is recommended.

**ArchLinux** · 02 May 2013, 06:26 AM

You and your RAIDs.. the name stems from having multiple disks and making ones full of porn easier to hide when you are being raided!

**dnebdal** · 02 May 2013, 09:14 AM

Originally posted by kebabbert View Post

True. People will always find something they can criticize, either the RAM amount was different, or the brand, or the electric current, or the mouse, or....

Please do! The more disks, the better ZFS scales, while ext4 does not. If you can, use LSI2008 cards, for instance the very cheap IBM M1015 which is a rebranded LSI 2008 card. It costs $80 on Ebay. On FreeBSD and Solarish distros, LSI2008 is recommended.

Even ICH10 would be interesting.

Announcement

ZFS vs EXT4 - ZFS wins

Comment

Comment

Comment

Comment

Comment

Comment