Results 1 to 10 of 26

Thread: ZFS vs EXT4 - ZFS wins

Hybrid View

  1. #1

    Default ZFS vs EXT4 - ZFS wins

    I have said this numerous times and now I am able to back my claims with results

    When given multiple dedicated disks, ZFS is considerably faster than EXT4. The notable exceptions are
    a) No Sync/Fsync tests. Meaningless in any case other than for academic purposes.
    b) Very simple configurations, such as a single disk, single HW Raid volume, or a simple mirror.

    The reason for this is that testing of software cannot be hardware blind. Limited hardware will always make simpler software look better. All the light-weight desktop environments are a point in case. On older systems with limited hardware resources, the heavy-weights will simply not have the leg room to come into their own right.

    Similarly ZFS has got a lot of features which are realy good when you have the right hardware, but on limited hardware the limited performance may outweight the benefits, depending on your requirements.

    Two points that I must raise:

    Firstly: Testing ZFS on a solid state disk eliminates its ability to hide the latencies of physically rotating disk drives. Don't get me wrong: SSDs are not bad for ZFS, on the contrary traditional hard disk drives are bad for EXT2/3/4

    Secondly: I watched CPU and run-queue while I performed the tests. With ZFS there was typically arround 80% CPU idle, with EXT4 it varied arround the 66% mark. This is on an otherwise idle system.

    The importance of this is two-fold: Firstly the CPU was not the bottle neck in this case, running at 2Gbps. I will soon replace the disk subsystem with one that supports 8Gbps. The disks/busses will there be more able to keep up with the requests, which means that the CPU will have to work harder to keep the disks fed with requests. If everything scales as expected, EXT4 will run into a CPU bottleneck first.

    Secondly even with the existing disk/FC configuration, if one adds a workload, there will be less CPU time available for the file system. ZFS will then suffer less.

  2. #2
    Join Date
    Jan 2012


    Hardware info states that you have a kernel derived from 2.6.32, which makes me wonder how recent is the ext4 code you're running. Back when the actual 2.6.32 kernel was released ext4 was still in its infancy. I know enterprise distros backport some things to older kernels, but this benchmark doesn't look representative of the current state of ext4 as it is now.

    Could you also provide the ZFS code version you're using? It's out-of-tree code so I can't even guesstimate it based on the kernel version.
    Last edited by Shnatsel; 04-25-2013 at 05:08 PM.

  3. #3

    Default Software versions

    This server is running RHEL 6.4 with current patches as well as ZFS-on-Linux 0.6.1 using the ZFS-on-Linux team's repository.

    RHEL is in fact quite conservative about Kernel versions, but I do NOT believe that current EXT4 versions are much faster. Very often the opposite is true - Bug fixes very often close shortcuts that were taken, and as a result introduce extra logic, extra precautions, extra work for the system. Of course some bugs are performance bugs, eg where sloppy code were used or where someone discover a smater algorithm for the same thing.

    How can I check the versions of the mdadm, lvm and ext4 drivers? (Sorry for my lack of knowledge but the bulk of my experience is Unix, not Linux.)

  4. #4
    Join Date
    Mar 2012


    Quote Originally Posted by hartz View Post
    I have said this numerous times and now I am able to back my claims with results
    First of all, thank you for your time and effort, doing these benchmarks and sharing your results. Thanks a lot!
    ZFS benchmarks seem very rare and I admit, doing it properly is difficult to the high number of variables involved.

    Thus, may I ask to clarify: I would like to know more about the disk used. What disk where used in what layout? All 22 mentioned disks at once? In one large ZFS pool / mdadm config? So a mirror would be 11 drives mirroring the other 11? How about Raid5 or RaidZ? How was Raid10 layout? What disk controller was used? Did you use seperate ZIL / cache devices?

    To what percentage was the pool / raid filled during the test? Was fragmentation an issue?

    Would be glad if you could clarify.

  5. #5

    Default Disk configuration

    I decided that separate ZIL / ARC etc caches would be an unfair advantage for ZFS. I wanted to compare as close as possible like-for-like functionality wise.

    So for EXT4 I used Ext4 on LVM on mdadm-raid on multipathd. mdadm provides the striping/data protection. LVM provides functionality such snapshots and volume grow/create/resize.

    For ZFS I used ZFS directly on multipathd. ZFS provides all the functionality.

    The disks are connected via both ports on a QLogic 2632 dual-port card, connected directly (No switch involved) to an EMC CX-310. The Clariion provides 10 x single-disk LUNs (As close as I can get to a JBoD) as well as a single 5-disk RAID-5 LUN, used in the HW tests.

    All of the JBoD-LUNs are in Tray0, the Raid5 Lun is in Tray1.

    Phoronix-Test-Suite reports 10 + 10 disks, but this is not accurate. I suspect that it is confused by seeing the disks both via multipathing (/dev/mapper/mpath{a..j} and also seeing the disks in one of the several other locations, eg /dev/sd*, /dev/disk/*/* ... I don't know what exactly.

    Ditto for the HW-Raid5 LUN.

    The CX310 only supports 2Gbps connection speed. I will re-do the tests with a VNX 5300 at full 8Gbps some time next week when I get access to that storage.

    OK, What else? Oh the actual configuration for each test:

    - JHMultidisk-ZFS-RaidZ-JBoD (5-disk Raid-Z)

    zpool create -o ashift=12 POOL raidz /dev/mapper/mpath{a..e}

    - JHMultidisk-ZFS-Raid10 (4 pairs)

    zpool create -o ashift=12 POOL mirror /dev/mapper/mpath[ab] mirror /dev/mapper/mpath[cd] mirror /dev/mapper/mpath[ef] mirror /dev/mapper/mpath[gh]

    - JHMultidisk-ZFS-Mirror-JBod

    zpool create -o ashift=12 POOL mirror /dev/mapper/mpath[ab]

    - JHMultidisk-EXT4-LVM-mdRaid5-JBoD (5-disk Raid-5)

    mdadm --create /dev/md2 --level=5 --raid-devices=5 /dev/mapper/mpath{f..j}

    - JHMultidisk-EXT4-LVM-mdMirror

    mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/mapper/mpath[ab]

    - JHMultidisk-EXT4-LVM-HWRaid5

    The arrays were destroyed and re-created between tests. So the only data on them were the test data.

    The mount command for ZFS was:
    zfs create -o atime=off -o mountpoint=/test_mountpoint POOL/test

    The mount command (process) for Ext4 was
    pvcreate /dev/md2; vgcreate testgroup /dev/md2; lvcreate -l xxxx -n ext4vol testgroup
    mkfs -r ext4 /dev/mapper/testgroup-ext4vol
    mount -o noatime /test_mountpoint /dev/mapper/testgroup-ext4vol

    I have actually completed a test for ZFS on the HW-raid lun. The performance was dismal, but I accidentally put the EXT4 "description" on the test and I don't know how to fix that so it is not part of the group at the moment.

    I have a few more interesting ideas for tests:

    Ext4 on a zVol

    ZFS Raid-Z2 with all 10 drives (I will do this test now)

    P.S. I have been trying to get an mdadm raid-10 test but failing dismally... The system crash when I try to run pvcreate and I then have to disconnect the FC cables to get it to complete the boot-up. I have basically given up on mdadm raid-10 testing.

  6. #6

    Default New Kernel

    I just patched again and got a few updates, including a new kernel.

    # uname -a
    Linux emc-grid 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29 16:51:51 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

  7. #7

    Default Some notes/observations on the results

    1. When mounting ZFS with sync=disabled I actually get worse performance.

    2. I did not get better results when mounting ZFS with checksum=off .... The results were the same to within the amrgin of error. The conclusion is that the bottle-neck is not the calculation of checksums. This will likely be different on a system where CPU is actually being stressed.

    3. Disabling file access time tracking is done to make the tests run quicker. I did some initial tests to check its impact and it appears to be about 10% consistently, for both ZFS and Ext4

    4. I once ran a test on Ext2 (I forgot to add the -t ext4 flag to mkfs). It is remarkably faster than ext4!!!

    5. In case anybody wats to know, I have googled and googled and did not find a documented way on how to make fs-mark run against a "specified" mountpoint. What I did was I replaced the "-s scratch" in the file /root/.phoronix-test-suite/test-profiles/pts/fs-mark-1.0.0/test-definition.xml with -s /test_mountpoint/scratch ... I wish the test would just prompt you for a file system/directory to test. In any case the test results reports only the file system type on which the directory /root/.phoronix-test-suite/installed-tests/pts/fs-mark-1.0.0/fs_mark-3.3/scratch/ resides, irrespective of what you are actually testing.

  8. #8

    Default RaidZ2 test added to results

    Now including ZFS RaidZ2 over 10 disks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts