Statistical Significance In Benchmark Results

Written by Michael Larabel in Phoronix on 25 September 2009 at 10:46 AM EDT. 6 Comments

For those of you following the developments of Phoronix Test Suite 2.2 (codenamed "Bardu"), some new benchmarking features were pushed into its Git tree this week. The latest Phoronix Test Suite 2.2 code now has better FreeBSD 8.0 compatibility and support for network proxies with network communication, but larger than that is new support for ensuring test results are statistically significant.

When any test profile is set to run multiple times, the Phoronix Test Suite is now capable of computing the standard deviation between each of the test runs. If the standard deviation of the test results exceeds a certain threshold (it's currently defined at 3.50%, but it's adjustable through the ~/.phoronix-test-suite/user-config.xml file), the Phoronix Test Suite will automatically increase the number of times that the test profile is to be run. This is done in hopes of lowering the standard deviation of the results, to ensure that the produced result is accurate. There are also safeguards in place against uselessly calling a test profile to run too many times, if the standard deviation is not changing, etc.

Through the user-config.xml file this option can be disabled entirely using the DynamicRunCount tag in the Statistics area as with the StandardDeviationThreshold. There is also a LimitDynamicToTestLength option for not applying this feature to tests that take longer than a defined amount of time to run.

Therefore to sum it up, by default if the Phoronix Test Suite notices the results for any test profile are starting to deviate, it can automatically increase the number of times the test is running in order to hopefully produce more accurate results. This new support is available through a Git snapshot today and can be found in Phoronix Test Suite 2.2 Alpha 3 to be released within the next week. Additional statistics / analytical features will also be coming to Phoronix Test Suite 2.2. To find out about some of the other features already available in 2.2 Bardu, read this news entry.

6 Comments