Statistical Significance In Benchmark Results

Posted by Michael Larabel on September 25, 2009

For those of you following the developments of Phoronix Test Suite 2.2 (codenamed "Bardu"), some new benchmarking features were pushed into its Git tree this week. The latest Phoronix Test Suite 2.2 code now has better FreeBSD 8.0 compatibility and support for network proxies with network communication, but larger than that is new support for ensuring test results are statistically significant.

When any test profile is set to run multiple times, the Phoronix Test Suite is now capable of computing the standard deviation between each of the test runs. If the standard deviation of the test results exceeds a certain threshold (it's currently defined at 3.50%, but it's adjustable through the ~/.phoronix-test-suite/user-config.xml file), the Phoronix Test Suite will automatically increase the number of times that the test profile is to be run. This is done in hopes of lowering the standard deviation of the results, to ensure that the produced result is accurate. There are also safeguards in place against uselessly calling a test profile to run too many times, if the standard deviation is not changing, etc.

Through the user-config.xml file this option can be disabled entirely using the DynamicRunCount tag in the Statistics area as with the StandardDeviationThreshold. There is also a LimitDynamicToTestLength option for not applying this feature to tests that take longer than a defined amount of time to run.

Therefore to sum it up, by default if the Phoronix Test Suite notices the results for any test profile are starting to deviate, it can automatically increase the number of times the test is running in order to hopefully produce more accurate results. This new support is available through a Git snapshot today and can be found in Phoronix Test Suite 2.2 Alpha 3 to be released within the next week. Additional statistics / analytical features will also be coming to Phoronix Test Suite 2.2. To find out about some of the other features already available in 2.2 Bardu, read this news entry.

Discuss this article in our forums, IRC channel, or email the author. You can also follow our content via RSS and on social networks like Facebook, Identi.ca, and Twitter (@Phoronix and @MichaelLarabel). Subscribe to Phoronix Premium to view our content without advertisements, view entire articles on a single page, and experience other benefits.
Latest Hardware Reviews
  1. Intel Haswell HD Graphics 4600 vs. AMD Radeon Graphics On Linux
  2. Intel Haswell HD Graphics 4600 Performance On Ubuntu Linux
  3. Intel Core i7 4770K "Haswell" Benchmarks On Ubuntu Linux
  4. The First Experience Of Intel Haswell On Linux
Latest Software Articles
  1. Optimized Binaries Provide Great Benefits For Intel Haswell
  2. 11-Way Linux, BSD Platform Comparison
  3. SNA Acceleration Works Great For Intel Core i7 Haswell
  4. The Linux Evolution For Intel Haswell's Performance
Latest Linux News
  1. LLVM 3.3 Officially Released
  2. LLVM/Clang Now Uses Loop Vectorizer At New Levels
  3. Intel GPU Driver Tries To Rip Out FBDEV Support
  4. Coreboot Doing AMD USB 3.0, Q35 QEMU Emulation
  5. VP9 Codec Now Enabled By Default In Chrome
  6. openSUSE 13.1 M2 Plays On PulseAudio 4.0
  7. Debian 7.1 Rounds In Some Bug-Fixes
  8. Min / Max FPS Comes To Test Results
  9. Google Pushes More Mesa / Gallium3D Patches
  10. The Phoronix Migration Is Fully Complete
  11. Linux 3.10-rc6 Kernel Brings In More Fixes
Latest Forum Talk
  1. LLVM 3.3 Officially Released
  2. Intel GPU Driver Tries To Rip Out FBDEV Support
  3. AMD Catalyst 13.6 Beta
  4. The Wayland Situation: Facts About X vs. Wayland
  5. VP9 Codec Now Enabled By Default In Chrome
  6. Gallium3D LLVMpipe Benchmarks From Intel Haswell
  1. Computers
  2. Display Drivers
  3. Graphics Cards
  4. Motherboards
  5. Peripherals
  6. Processors
  7. Software
  8. Operating Systems
  9. All Articles
  1. Linux Benchmarking
  2. OpenBenchmarking.org
  3. Phoronix Test Suite