Originally posted by tomtomme
View Post
What is totally intransparent is, the mean value is not calculated based on the values published in the article, but on **all** results (see the "raw" benchmark values).
This also leads to some "interesting" effects. Consider two benchmarks A and B, where A has e.g. 2 subbenchmarks (CPUs, block sizes, TCP/UDP, whatever) while B has only 1. If candidate 1 scores twice as good as 2 in A and candidate 2 scores twice as good in B, candidate 1 will be the clear winner - pow(2 * 2 * 0.5, 1/3) = 1.26. By "appropriate" choice of benchmarks, you can force the geometric mean to resemble mostly one scenario.
Also, the geometric mean then also contains even obvious outliers - check the values from the recent 10GBit/s network benchmarks:
Have a look at the "Ethr" TCP connections/s results - there are two results, where the deviation is almost as large as the mean value, and 3 times as large as the mean for any other candidate. Clear sign of a broken benchmark setup.
Comment