No announcement yet.

Running ZFS With CAM-based ATA On FreeBSD 8.1

  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by energyman View Post
    when was it introduced?
    In 2008 with the release of FreeBSD 7.0.


    • #12
      Let me comment the results.

      Sorry for the late response, I have just noticed this thread. I believe some results of this tests set are incorrect. I would like to comment them.

      First 9 results look fine:
      1. LZMA Compression - not really an I/O test, that could be seen from almost equal results.
      2. Gzip compression - same.
      3. Compile bench - not sure what exactly this test does, but OK. For non-threaded I/O NCQ may give slightly lower performance at the drive firmware level.
      4. Postmark - 44% benefit under parallel load is normal for CAM ATA because of NCQ.
      5. Unpacking kernel - unpacking is a single-threaded process with a lot of flushing. Small slowdown reason may be same as in 3.
      6. Write in 8 threads - CAM with NCQ won a bit, OK
      7/8. Write in 16/32 threads - increasing number of threads makes pattern more random, that penalizes legacy ATA, while NCQ in CAM probably compensates it.
      9. Write in 32 threads by 128MB - I can't explain why results slightly better then in 8, but CAM with NCQ still wins.

      But the rest are not good:
      10. Random write in 8 threads - for random tests tiobench uses 4K blocks. None of desktop drives (and especially laptop ones) can do more then 200-300 random I/Os. As result, the best what this test should show is about 1MB/s. Instead we can see about 49MB/s in both cases. Explanation is trivial - all data fit into ZFS caches and were written almost sequentially on file close. This is just not a disk subsystem test.
      11. Random write in 16 threads - due to increased active data set caching works worse. As result we can see lower speeds. Though speeds are still higher then possible, that means caching is still actively used.
      12. Random write in 4 thread by 128MB - as I have said, 25MB/s with legacy ATA can't be explained by anything except caching. Random write in 4 threads just can't be faster then random write in 16 threads in 11. This result is wrong by definition. Most probably something affected cache hits ratio between tests.
      13/14 Reading in 16 threads by 64MB and 256MB - the only reason why results of these two tests could be different is because of cache hits.

      So my conclusion: these tests were not considering cache effects. If it was assumed intentionally - then it is at least not an ATA subsystems, but cache effectiveness comparison. If it happen accidentally - then these results just do not mean anything.


      • #13
        Some alternative benchmarks

        To additionally ground my point here is some of my benchmark results. It was done on i386 9-CURRENT with 2GB RAM. Such memory-limited condition was chosen intentionally to minimize cache effects and really compare disk subsystems.

        Threaded I/O Tester v0.3.3:
        Test script:
        Legacy ATA results:
        CAM ATA results:
        Total data file size was set to 2GB to minimize cache effects. CAM ATA shows benefits of 30-50% in most of numbers.

        RAID-test v1.2:
        To compare disk subsystems performance unrelated to file systems - here is benchmarks of legacy and CAM ATAs in random read, write and mixed I/O requests of different sizes to raw disk:
        Here you can see almost double speedup on read requests. Write requests do not benefit because it is already covered by enabled drive write cache.


        • #14
          Some numbers about UFS

          I was asked to repeat same tests with UFS, so here they are.

          First I just run the same benchmarks over UFS. Here are the results:

          CAM ATA won, but it looks like 2GB of RAM is enough for UFS (unlike ZFS, especially on i386) to significantly cache data in this situation. So these results can not really be trusted.

          So I have repeated tests after removing 1GB of RAM:

          CAM ATA won again, but now with reasonable numbers.

          For completeness I have also repeated test with block size of 16K (default UFS block). It allows UFS to avoid read-modify-write operations on random writes:

          Here it can be seen that miltithreaded reading has up to double benefit
          because of NCQ. Pure write, covered by disk write cache, is almost
          unaffected, confirming to previous raidtest results.

          2 Phoronix: In my tests I am always trying to validate and explain every aspect of result. Until you do the same in your reviews, they won't worth much.

          PS: Note that this system was not really suitable for ZFS, so numbers can be compared only with special care and understanding. I had no goal to compare UFS and ZFS directly. They were made for completely different environments and each have own benefits and requirements.