Announcement

Collapse
No announcement yet.

Can DragonFly's HAMMER Compete With Btrfs, ZFS?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Right but the overall most important thing is to know WHAT your are benchmarking. Unless you know THAT then any benchmark used in testing a specific thing is null-and-void and pseudoscientific. Wanna test caching? Fine, but then do so properly with different amounts of cache available to the VFS-cache subsystem. Wanna test gzip times? All good and fine but do so properly with at least 2 different CPUs and GCC set at different optimizations, as that affect gzip performance. Simply put, know WHAT you'd wanna benchamark, then isolate THAT making other things equal and you have something that at least passes the low-watermark for validity.

    Originally posted by ciplogic View Post
    As far as for me it appears is just a better caching behavior. As this benchmark will not likely make OS to flush it's cache, may happen that things get "too fast".
    The issue is: if your application will use the same usage pattern will it fly as fast?
    And if you don't know how your application works the system then tossing a dice is just as valid.

    Originally posted by ciplogic View Post
    My point was that anomalies always appear in benchmarking, also as disk is two orders of magnitude slower than memory, and disk access even much more times, I think that is not a fault of Phoronix suite. Michael in all that can do is to run them and to see if are not problems statistically (which is a feature of PTS).
    Err that is where and why the bad reputation comes from. Anyone can run a benchmark (as is apparent at this site), but that does not a valid-benchmark-make! Like I said above, if you're out to benchmark performance of a filesystem then you need to make sure your testing and benchmarks actually test that and not caching/disks/CPU etc. If you don't then your testing is invalid in that context.

    Comment


    • #17
      Well, there are numerous problems with the benchmark. Take blogbench for example. Blogbench has simultanious read and write threads where the write activity creates an ever increasing data set size (starting at 0) and the read activity reads from that same data set. Thus if write performance is poor the data set simply does not grow large enough to blow out the system's filesystem buffer cache and read performance will appear to be very high. If write performance is high then the data set will grow beyond what memory can cache and read performance will wind up being very poor. So treating the numbers as separate entities and not being cognizant of whether the test blew out the buffer cache or not basically makes the results garbage.

      Another very serious problem is when these benchmarks are run on filesystems with filesystem compression or de-dup. The problem is that most of these tests don't actually write anything to the file. They will write all zeros are some simple pattern that is trivially compressed and, poof, you are suddenly not testing the filesystem or disk performance at all.

      A third problem related in particular to transaction tests is how often the benchmark program calls fsync() and what its expectations are verses what the filesystem actually does.

      A fourth is, well, you do realize that HAMMER maintains a fine-grained history (30-60 second grain) and you can access a snapshot of the filesystem at any point in that history. The whole point in using the filesystem, apart from the instant crash recovery, is to have access to historical data, so its kinda like comparing apples to oranges if you don't normalize the feature set.

      A fifth is the compiler, which is obvious in the gzip tests (which are cpu bound, NOT filesystem bound in any way).

      These problems are obvious just by looking at the crazy results that were posted, and the author should have realized this and tracked down the WHY. Benchmarks only work when you understand what they actually do.

      There are numerous other issues... whether the system was set to AHCI mode or not (DragonFly's AHCI driver is far better than its ATA driver). Whether the OS was tuned for benchmarking or for real-world activities w/ regards to how much memory the OS is willing to dedicate to filesystem caches. How often the OS feels it should sync the filesystem. Filesystem characteristics such as de-dup and compression and history. fsync handling. Safety considerations (how much backlog the filesystem or OS caches before it starts trying to flush to the media... more is not necessarily better in a production environment), characteristics in real load situations which require system memory for things other than caching filesystem data. And I could go on.

      In short, these benchmarks are fairly worthless.

      Now HAMMER does have issues, but DragonFly also has solutions for those issues. In a real system where performance matters you are going to have secondary storage, such as a small SSD, and in DragonFly setting a SSD up with its swapcache to cache filesystem meta-data to go along side the slower 'normal' 1-3TB HD(s) is kinda what HAMMER is tuned for. Filesystem performance testing on a laptop is a bit of an oxymoron since 99.999% of what you will be doing normally will be cached in memory anyway and the filesystem will be irrelevant. But on the whole our users like HAMMER because it operates optimally for most workloads and being able to access live snapshots of everything going back in time however long you want to go (based on storage use verses how much storage you have) is actually rather important. Near real time mirroring streams to onsite and/or offsite backups not to mention being able to run multiple mirroring streams in parallel with very low overhead is also highly desireable. It takes a little tuning (e.g. there is no reason to keep long histories for /usr/obj or /tmp), but it's easy.

      -Matt

      Comment


      • #18
        Only reproductible way to perform good benchmark is to trace all filesystem events (excluding actuall data content, if compression/deduplication is not used) on real world system (like monitor /home directory of real desktop, or / of real medium mail server) for long time (for example one month). And then replay them quickly as a benchmark, this will include very big mix of possible workloads, including random reads, writes, large file write, reads, parallel combinations of them, metadata operations, filesystem traversals, deletions, in parallal with other operations, etc, fragmentation of files and free space, data locality, waste of space, complex caching behaviour etc.

        Simple microbenchmarks are good for developers of filesystems because they can use them to infere what is going on in particular part of code (just like in science), but they are not any ultimate measure of quality and performance. They are only usefull for improving code, not really comparing multiple different filesystems.

        Most microbenchmarks are also repeated multiple times on completly clean filesystem, which excludes lots of factors from equation (simple and full cacheing, other operations on the same or other filesystems, no fragmentation). So often benchmarks recreate whole filesystem and dropes all caches beetween runs, or delete everything on filesystem (which do not need to be the same thing!) This at least recreates somehow similar conditions. But how to recreate complex conditions of desktop which was used for few months? If one will perform benchmark in subfolder of filesystem and the delete files after it, it is highly probably that end result will be far off the begining condition, so one cannot actually perform benchmark again. It is also hard to be reproduced by other person on other box.

        Most robust way to fix this problems is to use accelerated ageing of filesystem by replaying predermined (recorded aka traced) operations coming from real workload. One can also prepare such trace log (with some informations anonymized, like data contens and actuall filenames, just make them of similar length and structure, so dir operations will behave in similar way as in recorded system). Such logs will include all operations, including timestamps, pids, threads, locks, read, write, open, close, fsync, sync, unmount/mount/reboot, seek, tell, create, unlink, link, symlink, aio, o_direct, o_sync, mmap, fadvise, fallocate, error conditions, etc. etc.

        There are numerous projects which provides such tools for Linux (and few other filesystems) with very low performance overhead. They are very often used in network filesystems (like NFS), because tracing is just equivalent to running sniffer in beetween client and server, removing actuall file data content, and simply appending to (compressed) log. On local filesystem one will need to use some generic monitoring layer (like perf, kprobes, dtrace, bio event monitoring) or modules desgined for this. They all can hook beetwen userspace request of all kind involing filesystem and store logs somewhere else for later inspection or replay. Other possible ways include stacked filesystem (in kernel or FUSE) or generic VFS api for this (which we do not have currently AFAIK).

        One can also use this replays and stop them at predetermined points (or at the end), and compare multiple filesystems (or the same multiple times) and check (by direct comparission or saved checksums if also content data of files are replayed), if content of filesystem is the same for regression testing, conformance testing and other purposes.

        Other tracing is block device tracing which is mostly usefull for filesystem developers, but can also be of great importance for users (especially when performing benchmarks on part of device, or when using multiple devices like RAID or zfs/btrfs). Simple access graphs (time vs sector-number) or just cummulative-sum of read and write requests, and of course IO/s and MB/s per seconds vs time, can provide really interesting measures.


        For tracing, one should read this material:

        http://os.csie.ncku.edu.tw/drupal/si...1/membrane.ppt
        http://oss.csie.fju.edu.tw/~an96/dec10_replay.htm
        http://ccs.cs.stonybrook.edu/cip/wor...yfs-poster.pdf
        http://www.fsl.cs.sunysb.edu/docs/replayfs/index.html
        http://www.fsl.cs.sunysb.edu/~kolya/projects/

        Unfortunately most of them are somehow old and needs some small adjustments to work in newest kernels, but they are very useful for benchmarking real world file system operations.

        I hope phoronix will start using this tools for more robust benchmarks. (one can actually perform this easly, beucase recodred trace log can be the replayed repeatedly using userspace tools).

        Comment


        • #19
          The last three posts really summed things up wow.

          Comment


          • #20
            well call me DIABLO everyone knows that other than the Fast File System on a RAD everything else is a roll of the DICE :wave: at dillon

            Comment


            • #21
              And that linux kernel unpacking test it totally biased. Of course Linux is more optimized to unpack the linux kernel than bsd is, dough. Doing it on bsd is comparing oranges to apples.

              Comment


              • #22
                Originally posted by misiu_mp View Post
                And that linux kernel unpacking test it totally biased. Of course Linux is more optimized to unpack the linux kernel than bsd is, dough. Doing it on bsd is comparing oranges to apples.
                I can't tell if you're joking or not...

                Comment


                • #23
                  Then I have succeeded.

                  Comment


                  • #24
                    Originally posted by misiu_mp View Post
                    Then I have succeeded.
                    I read your comment history and figured it out, but it was too late to edit my post. Good job, by the way.

                    Comment


                    • #25
                      Trimming for brevity (If you feel I've misrepresented comments, please advise).

                      Originally posted by baryluk View Post
                      Only reproductible way to perform good benchmark is to trace all filesystem events
                      In general that is not needed. Yes to have consistent and absolutely reliable results. Further, it does become extremely useful if there are filesystem race conditions that you need to track down as well.


                      ...

                      Simple microbenchmarks are ... they are only usefull for improving code, not really comparing multiple different filesystems.
                      There are at least two fundamental classes of users, generalists and people carrying custom and targeted loads. If you understand your workload, micro-benchmarks can serve as a coarse guide for your system. For example, if your load values integrity, the fsync performance paired with journaling behavior gives you an indication of which filesystems to use and which ones to avoid.


                      ... If one will perform benchmark in subfolder of filesystem and the delete files after it, it is highly probably that end result will be far off the begining condition, so one cannot actually perform benchmark again. It is also hard to be reproduced by other person on other box.
                      ...
                      We are not talking a couple of percent in these cases, we are talking about multiples or orders of magnitude in a lot of cases. Improving the absolute repeatability shouldn't lead to large changes.

                      Comment


                      • #26
                        Hi, I assume you are Matt Dillon - the leader of the DragonFly BSD team. You're the domain expert in that platform.

                        In general, Michael is fairly religious in ensuring that he does a default install - allowing the decisions that are codified into the system - the decisions made by the developers on behalf of non-expert users.

                        Now, I believe that Michael would be willing to look to reconfigure a DragonFly BSD system to your specifications and re-run the same benchmarks again. Hey, even use your choice of operating system. Are you happy to do that? The only entry criteria is the tuning guide is hosted and publicly accessible for others.

                        Further, if there are any extra tests or benchmark that you like to see, I doubt there would be any problems to running those tests - or to add them to PTS.

                        Feel free to PM me, email me & michael (matthew at phoronix.com & michael at phoronix.com) or follow up on this thread.

                        Originally posted by dillon View Post
                        There are numerous other issues... whether the system was set to AHCI mode or not (DragonFly's AHCI driver is far better than its ATA driver). Whether the OS was tuned for benchmarking or for real-world activities w/ regards to how much memory the OS is willing to dedicate to filesystem caches. How often the OS feels it should sync the filesystem. Filesystem characteristics such as de-dup and compression and history. fsync handling. Safety considerations (how much backlog the filesystem or OS caches before it starts trying to flush to the media... more is not necessarily better in a production environment), characteristics in real load situations which require system memory for things other than caching filesystem data. And I could go on.

                        In short, these benchmarks are fairly worthless.

                        Now HAMMER does have issues, but DragonFly also has solutions for those issues. In a real system where performance matters you are going to have secondary storage, such as a small SSD, and in DragonFly setting a SSD up with its swapcache to cache filesystem meta-data to go along side the slower 'normal' 1-3TB HD(s) is kinda what HAMMER is tuned for. Filesystem performance testing on a laptop is a bit of an oxymoron since 99.999% of what you will be doing normally will be cached in memory anyway and the filesystem will be irrelevant. But on the whole our users like HAMMER because it operates optimally for most workloads and being able to access live snapshots of everything going back in time however long you want to go (based on storage use verses how much storage you have) is actually rather important. Near real time mirroring streams to onsite and/or offsite backups not to mention being able to run multiple mirroring streams in parallel with very low overhead is also highly desireable. It takes a little tuning (e.g. there is no reason to keep long histories for /usr/obj or /tmp), but it's easy.

                        -Matt

                        Comment


                        • #27
                          Umm, why?

                          Originally posted by mtippett View Post
                          In general, Michael is fairly religious in ensuring that he does a default install - allowing the decisions that are codified into the system - the decisions made by the developers on behalf of non-expert users.
                          We can infer from this that Michael is fairly religious in creating worthless benchmarks that are clearly biased toward systems that are tuned to be fast-and-dangerous by default.

                          Also, please provide tuning information with the benchmarks so that people can make suggestions for future improvements, which many of your readers would be more than happy to do.

                          Comment


                          • #28
                            A fifth is the compiler, which is obvious in the gzip tests (which are cpu bound, NOT filesystem bound in any way).
                            Totally agree, but it is quite interesting to see that that particular test differed 22% between best and worst on BSD using the same compiler and CPU.

                            Now if this is due to some filesystem saturating the CPU this is still something that affects filesystem performance, at least on that particular hardware.

                            Comment


                            • #29
                              Originally posted by thesjg View Post
                              @ciplogic -- what you seem to be failing to understand is that the ZFS random write results aren't actually possible, there is something else going on. Without an explanation as to what else is going on or WHY they are possible, all of the results published in this article are rubbish. 100% meaningless. So sure, examining why errant results occur might not be his job, but if that's the case, and he lets the article exist as-is, he will be disseminating gross misinformation.

                              The credibility of phoronix is pretty poor already, I suspect they will simply let this be another nail in the coffin.
                              no, it EASY what is going on. The same thing a lot of people observed in the past:
                              ZFS cheats. It caches a lot, even when told not to, and flushes later. But returns immediately. If you run out of cache, ZFS will thrash your disk for ages, but until that point, ZFS benchmarking will report amazing numbers.

                              I am surprised how quiet the d-bse people are. Always claiming how fast their HAMMER is and how well it scales. Well, if the kernel can't even do SMP I have my doubts about scalability.

                              Comment


                              • #30
                                I think giving these benchmarks another go against our recent DragonFly BSD 2.10 release is probably warranted.

                                Comment

                                Working...
                                X