Announcement

Collapse
No announcement yet.

ZFS Still Trying To Compete With EXT4 & Btrfs On Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ZFS Still Trying To Compete With EXT4 & Btrfs On Linux

    Phoronix: ZFS Still Trying To Compete With EXT4 & Btrfs On Linux

    With the recent release of ZFS On Linux 0.6.2 that provides an open-source native Linux kernel module implementation of the Sun/Oracle ZFS file-system, the performance is faster, there are greater Linux kernel compatibility, and other improvements. Here's a fresh round of ZFS Linux benchmarks against EXT4 and Btrfs.

    http://www.phoronix.com/vr.php?view=19059

  • #2
    I've asked when other ZFS benchmarks have been posted if the pool was created with "ashift=12" for backing devices that are optimized for 4k writes, or even better, "ashift=13" for 8k on modern SSDs (basically everything these days, though I know the 510 advertises physical 512 sectors for whatever reason, it should still do significantly better with 4k or 8k). I never find out what the answer is, but it's probably 'no', which means these tests are kind of pointless.
    Last edited by pdffs; 08-27-2013, 04:30 AM.

    Comment


    • #3
      Meh. Would be much more interesting if you'd explore some of ZFS's and BTRFS's other features (e.g. snapshots and effectiveness of the journal when deliberately trying to introduce errors).

      Comment


      • #4
        Personally I've been waiting for a benchmark comparing ZFS and Btrfs using different compression algorithms, primarily LZ4.
        The package I'm currently using says it supports "lzjb | gzip | gzip-[1-9] | zle | lz4" (pulled from "zfs set").

        Comment


        • #5
          Originally posted by pdffs View Post
          I've asked when other ZFS benchmarks have been posted if the pool was created with "ashift=12" for backing devices that are optimized for 4k writes, or even better, "ashift=13" for 8k on modern SSDs (basically everything these days, though I know the 510 advertises physical 512 sectors for whatever reason, it should still do significantly better with 4k or 8k). I never find out what the answer is, but it's probably 'no', which means these tests are kind of pointless.
          Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

          https://groups.google.com/a/zfsonlin...g/qCygxkVWam4J
          https://github.com/zfsonlinux/zfs/bl...ol_vdev.c#L108

          In this case, the drive was on the list, which mean that the benchmarks were done with ashift=13. This is what enabled ZFSOnLinux to go from underperforming ext4 in the IOMeter file server benchmark to outperforming it significantly. With that said, it is not clear to me how partitioning was done. ZFS would be somewhat handicapped (although not by much) if partitioning was done for it versus it doing its own partitioning. This is because the Linux elevator is redundant and is set to noop when ZFS has full control of the disk. Anyway, I have a few comments on each of the benchmarks:

          1. Good benchmarking is hard and it is easy to do benchmarks that provide irrelevant results. I can usually find issues with the design of Phoronix's benchmarks, but in the case of IOMeter, I have not found anything wrong yet. Incidentally, ZFS does well here. That is likely because of a mix of ARC and ZIL.

          2. The FS-Mark benchmarks tested the creation of 1MB files. This is a purely synthetic benchmark that does not match any workload anyone would do and does not matter to me much. If anyone has a real workload that does this, please let me know so I can start caring. Of some interest is how the filesystems scaled from 1 to 4 threads. ZFS had a 3% increase while btrfs and ext4 had 82% and 59% increases respectively. It is probably worth investigation why throughput did not increase quite so much. There is an Illumos patch to ZFS' internal IO elevator that might help with this. It will likely be merged in 0.6.3.

          3. Phoronix did not appear to use DBench as it was intended to be used. It is supposed to use a load file that simulates a specific application, but there is no information about that. Being designed to test network filesystems, it is really useful when data points at different client counts are taken, but Phoronix only tested 1 client. With that said, I am okay with how ZFS performed versus other filesystems. The numbers here do not matter to me much.

          4. Compile Bench is a fairly useless benchmark because compilation is not IO bound, yet this appears to run the IO workload without doing any real compilation. It is unlikely that a real build process will exceed more than a few megabytes per second, which basically any filesystem can handle. Despite that, it is interesting that ext4 managed to outperform the interface bandwidth of SATA III. The peak bandwidth of SATA III is approximately 600MB/sec, but ext4 managed 726MB/sec. This suggests that writes are being buffered. It is possible to get the same effect with ZFS by using a dedicated dataset and setting sync=disabled. This is what I do for builds on my computer. However, it does not make much of a difference because compilation is CPU-bound and not IO-bound.

          5. Postmark has a few interesting irregularities. The first is that it is absent in previous Phoronix benchmarks. I noticed this when I went to look at ZFS' relative performance to ext4 and others so that I could see how using a proper ashift changed things. Another is that the standard error calculation for ext4 and btrfs is both 0. This suggests that ext4 and btrfs were not writing to disk. This benchmark was intended to measure mailserver IO performance, but it does a remarkably poor job of that. First, it is single-threaded and second, it does not call fsync(). Good mail server software should should call fsync before reporting delivery to ensure data integrity, but that does not happen here. Mail server software intended to scale should be multithreaded, but this benchmark is single threaded. This writes about 500 small files that in total are less than 5MB, which the kernel has no reason to flush to disk. In the case of ZFS, the non-zero standard error suggests that data is being written out. If a crash occurred during this benchmark, the simulated mail would be lost on ext4 and btrfs while ZFS would have managed to save at least some of it. Doing better here means increased data loss in the event of a crash, which does not interest me very much.
          Last edited by ryao; 08-27-2013, 02:00 PM.

          Comment


          • #6
            Originally posted by ryao View Post
            Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.
            Did you do all this just to win on phoronix benchmarks?

            Comment


            • #7
              Isn't Intel SSDSC2CW12 an Intel 520 SSD, not an Intel 510 SSD?

              Isn't Intel SSDSC2CW12 an Intel 520 SSD, not an Intel 510 SSD?

              ark.intel.com does not find a match when searching for SSDSC2CW12, but googling it does bring up a few mentions of the Intel 520 SSD.

              So which is it? Intel 510 SSD or Intel 520 SSD?

              It is a mistake to use any Sandforce SSD for phoronix benchmarks, since many of the phoronix benchmarks write streams of zeros that are unrealistically easy for the Sandforce controller to compress.

              Comment


              • #8
                More Drives, and Data Reliability

                I would live to see Tests on ZFS done with at least 4 Drives using RaidZ or RaidZ2 modes, since that to me is the strongest feature of ZFS along with some Data reliability Tests, compared to BTRFS, and EXT4.

                Comment


                • #9
                  Originally posted by ryao View Post
                  Phoronix refuses to adjust ashift itself. However, ZFSOnLinux added a drive database that will automatically do this for known drives. It is incomplete, but it will grow as people send me information on drives that are missing. Instructions for those who wish to contribute are available on the mailing list. Note that the link to the database is outdated. The current database is visible in the repository.

                  https://groups.google.com/a/zfsonlin...g/qCygxkVWam4J
                  https://github.com/zfsonlinux/zfs/bl...ol_vdev.c#L108

                  In this case, the drive was on the list, which mean that the benchmarks were done with ashift=13. This is what enabled ZFSOnLinux to go from underperforming ext4 in the IOMeter file server benchmark to outperforming it significantly. With that said, it is not clear to me how partitioning was done. ZFS would be somewhat handicapped (although not by much) if partitioning was done for it versus it doing its own partitioning. This is because the Linux elevator is redundant and is set to noop when ZFS has full control of the disk. Anyway, I have a few comments on each of the benchmarks:
                  Yeah, I saw your other thread about the 0.6.2 release after I posted here. Are you sure the Intel 510 is in your list? It doesn't appear to be to me. As I posted in the other thread:

                  Originally posted by pdffs View Post
                  You might want to check the FreeBSD 4k quirks (ADA_Q_4K) list from ata_da.c to boost your list.

                  Comment

                  Working...
                  X