Originally posted by tytso
View Post
It's impractical for PTS to include detailed system preparation steps within the suite itself. The preparation is intensely focused on what the configuration or the variant part of the test run is. Just for FS, it could be mount options only, new vs old (aged), alternate FS, different kernels and their impact on a fs. Obviously this is meaningless for say compilar comparison. So it comes down to a routine similar to...
1. Prepare System Under Test
2. Prepare Configuration Under Test (this is really optional if the variant part is really the System Under Test)
3. Invoke "phoronix-test-suite benchmark <test>"
4. Upload to OpenBenchmarking for further discussion
5. Go to 1 for as many variants as you want.
6. Upload full comparison to OpenBenchmarking
7. If Michael is running the test, then generate an article.
We have talked about a way to take a collection of contexts and calling out to a locally configured script to put the system in that context for running the tests. That will effectively automate 2-6, but it won't be considered part of PTS, but rather it will lower the manual effort for people doing broader comparisons (or in the use with Software Development). My mental picture for the context file that might be useful for benchmarking is something like
Code:
<context-name> <context-information>
Code:
ext3-nobarrier 100GB-70%Cap-3%Frag-opts=nobarrier ext3-barrier 100GB-70%Cap-3%Frag-opts=barrier ext3-discard 100GB-70%Cap-3%Frag-opts=discard
I assume that you can see that the same structure could easily be extended to do bisection across an ordered set of kernels or git commits.
I can't speak for others, I haven't really taken to much advantage of the Phoronix test suite because the signal to noise ratio has been too low. The focus on competition between file systems, as opposed to watching for regressions, isn't really useful for developers.
Phoronix Test Suite itself, is merely a test execution environment. The results that it generates, and the feeding of the information into articles in Phoronix.com is independent. I'm sure there is actually a lot of value that you could get out of the suite itself - from making available simplified repeatable test cases to monitoring updates to your code as you make them.
...
Now, I don't blame you for that --- in the end, your primary responsibility is to continued success of the commercial enterprise of this web site, which means if sensationalism drives web hits, then sensationalism it shall be.
Now, I don't blame you for that --- in the end, your primary responsibility is to continued success of the commercial enterprise of this web site, which means if sensationalism drives web hits, then sensationalism it shall be.
But the fact remains that developers are also extremely busy folks, and if they have to spend a huge amount of time figuring out what the results might mean, they're likely not going to bother. Developers also tend to prefer benchmarks which test specific parts of the file system, one at a time. This is why we tend to use benchmarks such as FFSB, with different profiles such as "large file create", "random writes", "random reads", "large sequential reads", etc. Another favorite benchmark is fs_mark, which tests efficiency of fsync() and journaling subsystems. I don't mind looking at the application centric benchmarks, but I'm not likely to try to set them up. But if you give me lock_stat and oprofile runs, I'll very happily look at them, and discuss what the results might mean, and then work to improve those workloads as part of my future development efforts.
That said, neither Michael or myself would shirk away from deep-diving when the canary indicates that something is wrong. It is a two way street, where the integration of the tools and methodology for a domain of expertise needs to have leadership from the outside, the integration is where the biggest win is.
The bottom line is that benchmarking for the sake of improving the file system requires close cooperation with the developers. I'm not sure whether that's compatible with Phoronix's mission. If so, I'd be happy to work with you more closely. And if it's not Phoronix's cup of tea, that's OK. There is room for multiple different approaches to benchmarking. All I ask that they not be too misleading, but that's more for the sake of not leading naive users down the primrose path.....
I don't do articles on Phoronix.com, but do blog postings on OpenBenchmarking.org, so there are ways of getting messages out through that too.
From this thread, so areas that PTS can immediately add value are in
1. Distributed end-user testing - you can get people to run a single command to get consistent results from a broad set of users)
2. Regression Management - We have trackers at http://phoromatic.com/kernel-tracker.php, setting up one is _very_ easy. Currently that one watches the ubuntu-upstream-kernel builds, but could easily do a git pull;make sort of cycle. This is very interesting since you can have distributed systems that are used for testing
3. Reproducing scenarios - If an end user sees an issue with a particular behaviour, capturing a test-case allows it to be more easily reproduced internally to developers.
Some concrete areas that we'd like to see is suggestions for improvements in the test cases or benchmarks themselves. If there are suites of tests that characterize a filesystem's behaviour integrating it isn't much of a problem.
Leave a comment: