Announcement

Collapse
No announcement yet.

Statistical Significance In Benchmark Results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
    wizard69
    Senior Member

  • wizard69
    replied
    Thanks for the info!

    As you probably know I'm new to this forum, so please excuse if the following has been covered or is out of context.

    What I want to know is what safe guards are in place to make sure that the latest Turbo Boost based processors are loaded to the point that thermal throttling is discovered. It is my position that tricking out a system for maximum performance is al well and good but if those benchmark numbers don't translate into valid figures for normal implementations of a chip then you haven't done your readers much of a favor.

    So lets say your bench mark runs a series of video encoding tasks, which ought to load the processor across all cores. Now initially for a small file this may not impact the chip to the point that thermal throttling is noticeable. But what happens if the we have something less than a high performance cooler and sub optimal thermal conditions, something that reflects most home based systems?

    I ask this because of the Intel based rebuttal to you earlier Lynnfield tests. I'm certain that the BIOS issue was real, after all this is brand new product, but I have to wonder about the differences in the results which really don't make sense. It makes me wonder if the processors might have been sitting under a huge air conditioner as this would likely keep the cores running at higher clock rates.

    I bring this up because we really haven't had processor quite like this on the market in the past. Thus it is hard to offer up a clear picture of what one can expect out of Lynnfield given non optimal conditions. The sad thing is we are talking big difference in performance based on how well the chip can cool itself over time. So it would make sense to test a given processor with a variety of heat removal capabilities to see just how much of a regression we will see with those different heat sinks. A simple question might be how long does Intels stock cooler allow a Lynnfield to benefit from Turbo Boost over the course of a long video encoding, in a room free of air conditioning.

    As you can see I'm puzzled by what sort of performance a person making an average investment in Lynnfield would get. Even with Intels tests, which in some cases I find bogus, it looks like an AMD chip works just as well.


    Dave

    Leave a comment:

  • ssam
    Senior Member

  • ssam
    replied
    Thanks. great addition.

    Leave a comment:

  • krazy
    Senior Member

  • krazy
    replied
    Hi Michael.

    Have you considered adding some kind of ANOVA function to the PTS? Having a confidence interval (95% or somthing) on each graph would be very useful I think.

    For example in the BFS article, while you imply that BFS is faster for PHP compilation, I suspect that the difference is statistically insignificant, and BFS cannot really be said to be faster with any reasonable confidence.

    For an example of the sort of analysis I mean, see here.

    Leave a comment:

  • Jono
    Junior Member

  • Jono
    replied
    but larger than that is new support for ensuring test results are statistically significant. When any test profile is set to run multiple times, the Phoronix Test Suite is now capable of computing the standard deviation between each of the test runs...
    I just registered for these forums so I could say: "Thank you!". This can add some real meaning to the Phoronix test results, rather than only giving a feel of what might be going on.

    One thing to be careful of when increasing the number of runs is the difference between statistical significance and practical significance. Given enough runs, every comparison will become statistically significant - but a statistically significant difference of 0.5% is of no practical significance (there's usually not much point in scoring a "win" for an application or device by such a small amount, even if it is a real difference).

    Anyway, I'll say thanks again. Winner of best feature award for sure.

    Leave a comment:

  • Michael
    Phoronix

  • Michael
    replied
    Originally posted by chaos386 View Post
    Excellent addition. Will a feature also be added to put error bars on the graphs, so the final standard deviation is visible on the charts?
    You can view the spread right now (and for the past months) using "phoronix-test-suite analyze-all-runs <result>". Though building into the Adobe SWF/Flash renderer I may end up writing support so that the different information is built into the graph itself and can be displayed on mouse-over or when clicking a button or something else, such as for when results are displayed on Phoronix.com.

    Leave a comment:

  • chaos386
    Phoronix Member

  • chaos386
    replied
    Excellent addition. Will a feature also be added to put error bars on the graphs, so the final standard deviation is visible on the charts?

    Leave a comment:

  • phoronix
    Administrator

  • phoronix
    started a topic Statistical Significance In Benchmark Results

    Statistical Significance In Benchmark Results

    Phoronix: Statistical Significance In Benchmark Results

    For those of you following the developments of Phoronix Test Suite 2.2 (codenamed "Bardu"), some new benchmarking features were pushed into its Git tree this week. The latest Phoronix Test Suite 2.2 code now has better FreeBSD 8.0 compatibility and support for network proxies with network communication, but larger than that is new support for ensuring test results are statistically significant. When any test profile is set to run multiple times, the Phoronix Test Suite is now capable of computing the standard deviation between each of the test runs...

    http://www.phoronix.com/vr.php?view=NzU2MA
Working...
X