Announcement

**coder** · 11 July 2019, 11:35 PM

Originally posted by Michael View Post

arithmetic mean would be inaccurate when there is different scales / units of measurement involved.

Or you could just normalize by some baseline machine.

The bigger issue with units is that some units are a rate, while others are time. A CPU that's twice as fast will have benchmarks with a time that's 50% of the baseline. However, the rate-based benchmarks will be 2x. The two should have the same effect on the result, although they won't.

I'd convert all rate-based benchmark results to a time, which is a nice, linear measure. After computing the average, you can take the inverse of this to estimate of how much faster (or slower) the test subjects are than the baseline.

Also, I believe median is a useful way to gauge the "typical" speedup.

**coder** · 11 July 2019, 11:39 PM

Originally posted by grigi View Post

arithmetic mean gives more wieght to larger objects. So if one benchmark emits seriously large numbers, it is going to dominate and turn everything else into noise. It is surely the worst to use.

You could trim the outliers.

**coder** · 11 July 2019, 11:41 PM

Originally posted by carewolf View Post

Please don't use -march=x86-64

He should use whatever most distros use.

**carewolf** · 12 July 2019, 03:16 AM

Originally posted by coder View Post

He should use whatever most distros use.

Which is nothing. You don't specify architecture if you want the generic architecture

**rene** · 12 July 2019, 04:41 AM

Originally posted by carewolf View Post

Which is nothing. You don't specify architecture if you want the generic architecture

Which is what you need to use if you want your distribution to run on any last years AMD64 machine. This is why I postulated that maybe hot-spot SIMD vectoring JIT-ing even C & C++ might be a much more performance and universal setup: https://www.youtube.com/watch?v=-VZmXO381HQ

**Djhg2000** · 12 July 2019, 07:06 PM

Originally posted by thebear View Post

Here's a paper on why geometric mean is to be preferred to arithmetic mean (I'm not familiar with the journal so I cannot speak to it's peer review process though):

"How not to lie with statistics: the correct way to summarize benchmark results", Communications of the ACM - The MIT Press scientific computation series CACM Homepage archive Volume 29 Issue 3, March 1986 Pages 218-221 ACM New York, NY, USA

How not to lie with statistics: the correct way to summarize benchmark results | Communications of the ACM

https://dx.doi.org/10.1145/5666.5673

Using the arithmetic mean to summarize normalized benchmark results leads to mistaken conclusions that can be avoided by using the preferred method: the geometric mean.

or
https://www.cse.unsw.edu.au/~cs9242/...Wallace_86.pdf

That paper was a nice and definitive explanation as to why the geometric mean is what we should use.

However, considering the desirable outcome of some benchmarks are on an inverted scale (operations per time vs time per operation), the latter should be inverted (1/time) in order to be representative of the performance.

Shouldn't that be made clear in the presentation of the geometric mean?

**coder** · 12 July 2019, 09:46 PM

Originally posted by Djhg2000 View Post

However, considering the desirable outcome of some benchmarks are on an inverted scale (operations per time vs time per operation), the latter should be inverted (1/time) in order to be representative of the performance.

I think you've got it backwards. You want to average the times.

Consider the case where you run a test 3 times and get results of 3, 4, and 5 seconds. The mean is 4 seconds, which equates to an average throughput of 0.25 ops/sec.

However, if you average the rates, then you get a mean rate of 0.26111 ops/sec, incorrectly suggesting that 3.1333 ops were completed in the combined 12 seconds of the 3 trials.

**Djhg2000** · 13 July 2019, 10:42 AM

Originally posted by coder View Post

I think you've got it backwards. You want to average the times.

Consider the case where you run a test 3 times and get results of 3, 4, and 5 seconds. The mean is 4 seconds, which equates to an average throughput of 0.25 ops/sec.

However, if you average the rates, then you get a mean rate of 0.26111 ops/sec, incorrectly suggesting that 3.1333 ops were completed in the combined 12 seconds of the 3 trials.

No, if we want larger numbers to be better then we need to use 1/time. The resulting number has no useful unit and is only valid for relative comparisons with the same set of benchmarks anyway, but they all need to tend towards the same limit (0 or infinite) for better performance.

Consider this example; benchmark BA gives a score based on the number of frames processed in a given time span, benchmark BB gives the time it took to process a set of frames and benchmark BC gives a score of an arbitrary unit where higher is better. Now, if we use the geometric mean we get the final score S = (BA*BB*BC)^(1/3). Three machines (MA, MB, MC) are now benchmarked and produce the following results:

Benchmark/Machine	MA	MB	MC
BA	100	99	101
BB	10	9	300
BC	500	510	490

Notice that machine MC is a very poor fit for benchmark BB. Using just those numbers yields a final score of:

Score/Machine	MA	MB	MC
S_pure	79.370	76.880	245.78

Here, it looks like machine MC is clearly the winner, but the individual benchmarks tell us that shouldn't be the case. If we instead use 1/BB when calculating the final score (and thus make all results proportional with respect to the desirable outcome), we get:

Score/Machine	MA	MB	MC
S_proportional	17.100	17.769	5.4844

Now the score properly reflects how machine MC performed horribly in benchmark BB.

I hope this makes my point a bit more clear.

**coder** · 13 July 2019, 11:48 AM

Originally posted by Djhg2000 View Post

No, if we want larger numbers to be better then we need to use 1/time.

No, I want more accurate numbers.

I think my example was pretty damn clear that averaging rates is incorrect. If you want to convert it to an average rate, you can do that at the end.

**Djhg2000** · 13 July 2019, 05:56 PM

Originally posted by coder View Post

No, I want more accurate numbers.

I think my example was pretty damn clear that averaging rates is incorrect. If you want to convert it to an average rate, you can do that at the end.

But that's what we're discussing, isn't it?

Announcement

The Performance Impact To AMD Zen 2 Compiler Tuning On GCC 9 + Znver2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment