Announcement

Collapse
No announcement yet.

Large HDD/SSD Linux 2.6.38 File-System Comparison

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by energyman View Post
    Which means that ext4 with its 'sometimes a crash can mean original and destination are both 0' isn't good enough either
    That's due to buggy applications that don't use the POSIX interfaces correctly (like it or not, open(), close(), read(), write(), and fsync() are all POSIX interfaces and the need to use fsync() correctly goes back decades --- and the need to do this is true for all modern file systems). But that's another discussion/flame war....

    Comment


    • #62
      Originally posted by loonyphoenix View Post
      Ext3 has barrier=0 as default? Really? Seems strange.

      Isn't that a distro-specific thing, though, default mount options?
      The only distro that I know of that has barriers enabled by default on EXT3 is SuSE/openSUSE.

      Comment


      • #63
        Originally posted by deanjo View Post
        The only distro that I know of that has barriers enabled by default on EXT3 is SuSE/openSUSE.
        I am fairly sure that that Red Hat enabled barriers by default in RHEL 6, and although I'm not 100% sure, I believe in recent Fedora releases as well. My source on this is Ric Wheeler, formerly of EMC and who is now the file system manager at Red Hat.

        -- Ted

        Comment


        • #64
          Originally posted by tytso View Post
          This probably explains why it still uses the debugfs hack and not a more general FIEMAP interface. So hopefully we can get this up on github, and people can help get the tool in shape so it can be used by folks such as your benchmarking operation. Some assembly will still be necessary, but it shouldn't be that much work.
          My view at this stage, is that the application of aging or similar approaches to position the system under test (SUT) to be in a particular form is something that is for the most part outside the purview of PTS and is more involved in SUT preparation. (Obviously a comparison between an aged and unaged filesystem is a completely different issue since the variant portion is actually the aged filesystem).

          One thing that I definitely need to give you guys kudos for is that you do document your hardware configurations for the System and Configuration Under Test, and you do strive for strong reproducibility. That's all good stuff.
          Thanks. Although I generally don't see the development community try to attempt to reproduce issues, but that is a different issue .

          One of the things which they do which is incredibly helpful to file system developers is that they will do oprofile and (very important on larger CPU count machines) lockstat runs. Enabling oprofile and/or lockstat will of course skew the benchmark results, so they have to be done separately, and the performance results discarded, but the oprofile and lockstat information is very useful in showing what are the next things that can be optimized to further improve the file system.

          Another very useful analysis tools for understanding why the results are the way they are is to use blktrace. ...
          PTS has a MONITOR capability. It is generally used for monitoring load, temperature and so on. Hooking in other tools that add more value for developers should be fairly easily. Feel free to contact me matthew at phoronix.com to discuss further.
          Last edited by mtippett; 14 March 2011, 02:12 PM. Reason: Remove extraneous spaces.

          Comment


          • #65
            Originally posted by mtippett View Post
            My view at this stage, is that the application of aging or similar approaches to position the system under test (SUT) to be in a particular form is something that is for the most part outside the purview of PTS and is more involved in SUT preparation. (Obviously a comparison between an aged and unaged filesystem is a completely different issue since the variant portion is actually the aged filesystem).
            So my impression is that you have a framework which will iterate over a number of file systems, reformat the partition to file system $foo, mount the partition as $foo, and then run the suite of benchmarks. Do you not? Or are you doing this manually, by hand? If you do have such a framework, then the file system aging function needs to inserted after the mkfs step. I'm not sure what is formally considered part of the PTS, and what is considered part of test framework. Presumably whatever uploads the results to the open benchmarking web site is also part of the test framework, which I thought was part of PTS ---- so I had assumed the mkfs was part of the PTS subsystem.

            Thanks. Although I generally don't see the development community try to attempt to reproduce issues, but that is a different issue .
            I can't speak for others, I haven't really taken to much advantage of the Phoronix test suite because the signal to noise ratio has been too low. The focus on competition between file systems, as opposed to watching for regressions, isn't really useful for developers. (*Especially* when you're not filtering out things like barrier vs. nobarrier issues.) And when there has been fluctuations, there's been no attempt to explain why it might be happening. I suspect that for some, there has been some distaste over the sensationalism over the fsync() changes which were need to protect data safety, and since there was clearly no understanding over what was happening, the writeup sometimes would assume that one-time changes would translate into long-term trends.

            Now, I don't blame you for that --- in the end, your primary responsibility is to continued success of the commercial enterprise of this web site, which means if sensationalism drives web hits, then sansationalism it shall be.

            But the fact remains that developers are also extremely busy folks, and if they have to spend a huge amount of time figuring out what the results might mean, they're likely not going to bother. Developers also tend to prefer benchmarks which test specific parts of the file system, one at a time. This is why we tend to use benchmarks such as FFSB, with different profiles such as "large file create", "random writes", "random reads", "large sequential reads", etc. Another favorite benchmark is fs_mark, which tests efficiency of fsync() and journaling subsystems. I don't mind looking at the application centric benchmarks, but I'm not likely to try to set them up. But if you give me lock_stat and oprofile runs, I'll very happily look at them, and discuss what the results might mean, and then work to improve those workloads as part of my future development efforts.

            Eric Whitney at HP, who I've mentioned before, does all of these things, and he's been extremely helpful. He invests time to assist ext4 development, and exchange, we help him out by figuring out why his 48-core system was crashing --- turns out there was a big in the block layer that his benchmarking efforts were tripping, and as a result of the collaboration, it will be fixed before 2.6.38 ships. Just today, we spent half of the ext4 weekly concall talking about his recent results testing patches that will be going into 2.6.39 merge window: http://free.linux.hp.com/~enw/ext4/2.6.38-rc5/

            The bottom line is that benchmarking for the sake of improving the file system requires close cooperation with the developers. I'm not sure whether that's compatible with Phoronix's mission. If so, I'd be happy to work with you more closely. And if it's not Phoronix's cup of tea, that's OK. There is room for multiple different approaches to benchmarking. All I ask that they not be too misleading, but that's more for the sake of not leading naive users down the primrose path.....

            Comment


            • #66
              Originally posted by tytso View Post
              So my impression is that you have a framework which will iterate over a number of file systems, reformat the partition to file system $foo, mount the partition as $foo, and then run the suite of benchmarks. Do you not? Or are you doing this manually, by hand? If you do have such a framework, then the file system aging function needs to inserted after the mkfs step. I'm not sure what is formally considered part of the PTS, and what is considered part of test framework. Presumably whatever uploads the results to the open benchmarking web site is also part of the test framework, which I thought was part of PTS ---- so I had assumed the mkfs was part of the PTS subsystem.
              Those steps are not part of the framework itself. My involvement is primarily Phoronix Test Suite & OpenBenchmarking, not Phoronix.com. I believe that Michael does the system prep manually. We do have the concept of "context" for the test which is about either preparing the system or configuration under test. But that isn't fully fleshed out.

              It's impractical for PTS to include detailed system preparation steps within the suite itself. The preparation is intensely focused on what the configuration or the variant part of the test run is. Just for FS, it could be mount options only, new vs old (aged), alternate FS, different kernels and their impact on a fs. Obviously this is meaningless for say compilar comparison. So it comes down to a routine similar to...

              1. Prepare System Under Test
              2. Prepare Configuration Under Test (this is really optional if the variant part is really the System Under Test)
              3. Invoke "phoronix-test-suite benchmark <test>"
              4. Upload to OpenBenchmarking for further discussion
              5. Go to 1 for as many variants as you want.
              6. Upload full comparison to OpenBenchmarking
              7. If Michael is running the test, then generate an article.

              We have talked about a way to take a collection of contexts and calling out to a locally configured script to put the system in that context for running the tests. That will effectively automate 2-6, but it won't be considered part of PTS, but rather it will lower the manual effort for people doing broader comparisons (or in the use with Software Development). My mental picture for the context file that might be useful for benchmarking is something like

              Code:
                <context-name> <context-information>
              You would then have a script that can take you to the particular context. So for filesystems you might have a file such as

              Code:
                 ext3-nobarrier  100GB-70%Cap-3%Frag-opts=nobarrier
                 ext3-barrier  100GB-70%Cap-3%Frag-opts=barrier
                 ext3-discard  100GB-70%Cap-3%Frag-opts=discard
              The person executing the comparison would need to write a script that is invoked as "set-context.sh 100GB-70%Cap-3%Frag-opts=nobarrier" which would then do the system preparation (100GB, 70% capacity, 3% fragmentation, mount opts=xxx).

              I assume that you can see that the same structure could easily be extended to do bisection across an ordered set of kernels or git commits.

              I can't speak for others, I haven't really taken to much advantage of the Phoronix test suite because the signal to noise ratio has been too low. The focus on competition between file systems, as opposed to watching for regressions, isn't really useful for developers.
              Phoronix Test Suite is effectively an independent project that grew out of the personal discussions that Michael and I would have regarding the results being presented on Phoronix Test Suite.

              Phoronix Test Suite itself, is merely a test execution environment. The results that it generates, and the feeding of the information into articles in Phoronix.com is independent. I'm sure there is actually a lot of value that you could get out of the suite itself - from making available simplified repeatable test cases to monitoring updates to your code as you make them.

              ...

              Now, I don't blame you for that --- in the end, your primary responsibility is to continued success of the commercial enterprise of this web site, which means if sensationalism drives web hits, then sensationalism it shall be.
              Again for the record, I am not involved in direct way with Phoronix.com. My involvement is tangential into Phoronix Test Suite and OpenBenchmarking. My day job is as an driving teams of engineers, it's just I have a bent for seeing good engineering done, and Phoronix Test Suite is a way that I can help the industry.

              But the fact remains that developers are also extremely busy folks, and if they have to spend a huge amount of time figuring out what the results might mean, they're likely not going to bother. Developers also tend to prefer benchmarks which test specific parts of the file system, one at a time. This is why we tend to use benchmarks such as FFSB, with different profiles such as "large file create", "random writes", "random reads", "large sequential reads", etc. Another favorite benchmark is fs_mark, which tests efficiency of fsync() and journaling subsystems. I don't mind looking at the application centric benchmarks, but I'm not likely to try to set them up. But if you give me lock_stat and oprofile runs, I'll very happily look at them, and discuss what the results might mean, and then work to improve those workloads as part of my future development efforts.
              This part is really hard, each developer has their own sub-component or subsystem that they care about, and for each of those there are a set of metrics that directly affect those systems. But there are a hell-of-a-lot of subsystems that represent vastly different areas. So the middle ground is finding benchmarks and tests that serve as a canary in a coal mine to trigger the deeper digging. The deeper digging into a particular sub-domain marginalizes the other domains.

              That said, neither Michael or myself would shirk away from deep-diving when the canary indicates that something is wrong. It is a two way street, where the integration of the tools and methodology for a domain of expertise needs to have leadership from the outside, the integration is where the biggest win is.

              The bottom line is that benchmarking for the sake of improving the file system requires close cooperation with the developers. I'm not sure whether that's compatible with Phoronix's mission. If so, I'd be happy to work with you more closely. And if it's not Phoronix's cup of tea, that's OK. There is room for multiple different approaches to benchmarking. All I ask that they not be too misleading, but that's more for the sake of not leading naive users down the primrose path.....
              So long as the developers are engaged in looking at the problem, rather than blaming the tool, we've got no concerns working with any developer (be it under the OpenBenchmarking or the Phoronix banner).

              I don't do articles on Phoronix.com, but do blog postings on OpenBenchmarking.org, so there are ways of getting messages out through that too.

              From this thread, so areas that PTS can immediately add value are in

              1. Distributed end-user testing - you can get people to run a single command to get consistent results from a broad set of users)
              2. Regression Management - We have trackers at http://phoromatic.com/kernel-tracker.php, setting up one is _very_ easy. Currently that one watches the ubuntu-upstream-kernel builds, but could easily do a git pull;make sort of cycle. This is very interesting since you can have distributed systems that are used for testing
              3. Reproducing scenarios - If an end user sees an issue with a particular behaviour, capturing a test-case allows it to be more easily reproduced internally to developers.

              Some concrete areas that we'd like to see is suggestions for improvements in the test cases or benchmarks themselves. If there are suites of tests that characterize a filesystem's behaviour integrating it isn't much of a problem.

              Comment

              Working...
              X