The Five Stages of Benchmark Loss

skywarp04 replied

06 March 2010, 11:41 AM
Originally posted by dashcloud View Post

Does the same sort of thing happen when Windows benchmarks like 3DMark or other suites are run?

Are you kidding me? You should be read some comments after some site posts benchmarks where one video card or CPU performs consistently better than another. A lot of the article writers know that a flame war is going to happen so they will end the article with something like, "let the flame wars begin." It always amazes me how emotional people get about flaws being found in something. They always take it as a personal attack. That attitude probably comes about from the trolls who do try to use flaws and regressions as personal attacks. That's a whole other problem.

Stuff like this always reminds me of when I used to work on cars. People were always so dedicated to their brands, Ford vs Chevy, Honda vs Toyota. I admit that I'm partial to Fords and older Dodge products, but I'm not dillusional and believe that they make perfect vehicles. I like those companies beause of their history and I tend to like their styling better, but every model made by every manufacturer has some flaws and every manufacturer has some model that has more than an acceptible number of flaws. They are all mechanical and made by humans, they ALL break. My favorite argument to listen to was how much better Toyotas were than all other manufacturers. I like Toyota, but I could, at that time, list off 10 - 15 major flaws found in various models. People just need to learn to accept the fact that everything has flaws, man made or not, but especially man made items. There is no such thing as the perfect OS, software, car, coffee maker, whatever. The important thing is that when flaws or regressions are found is that someone takes a serious look at it, tries to reproduce it, if it's reproducible then fix it. Why does that have to involve personal attacks?
Leave a comment:
dashcloud replied

25 February 2010, 06:28 PM
Does the same sort of thing happen when Windows benchmarks like 3DMark or other suites are run?
Leave a comment:
mtippett replied

24 February 2010, 04:17 PM
Originally posted by sabriah View Post

I have seen that you have contacted developers in the case of regressions, and that is a key point. Always report bugs, as otherwise developers don't know they are there. They don't read every site.

Also, I think it is important to make these reports through the main bug report channels. Whether that is the distro's channels or the the affected application's channels should not be an issue as in a perfect world the bug report would percolate to the correct maintainer. However, I am afraid many bug reports stop somewhere in along the line.

Also, with several distro trees nested inside each other some patches may have been solved only in one branch/fork; think of the sequence Debian>Ubuntu>Mint. What happens with a bug report to Ubuntu, will it reach Debian and/or Mint? We can only hope that all distros will further bug reports to the actual source and not make a fix peculiar to their domain.

Thanks for your pdf and ogg!

Okay. So for a particular issue (KVM SQLITE results 100s of times faster than) I did that. Due to the developers not wanting to get to stage 4, it became a pain point.

I contacted the three primary projects involved (SQLITE, Ubuntu, KVM, QEMU). KVM was blaming Ubuntu, blaming phoronix, blaming QEMU. SQLITE not interested in a simplistic benchmark.)

I raised a launchpad issue as a cover for the work and actively asked questions to piece together what was occuring. The KVM developers who simply didn't believe that they could be at fault actually went and closed the issue as being not Ubuntu, KVM or QEMUs problem.

In the end, cooler heads prevailed and a KVM patch was applied to alleviate the situation, but the effort to get bugs filed and communicate with the teams was quite frankly a waste of time and effort. The numbers stood, the benchark stood, and a fix was made. Cost to me was about 30 or so emails, personal attacks and days of wasted effort lodging bugs and arguing details that were consequently closed by holier-than-thou developers. I did try both ways in this case - I asked politely on mailing lists, I raised bugs as requested.

The reality of the situation is that if the affected parties aren't willing to get to the analysis stage, then there is virtually no point in filing a bug without a receptive developer.

As I mentioned in the talk, there is no reason that a lot of this should be surprising to developer of a project. The tests Phoronix uses are consistent, the tests are trivial to run, but the results are rejected by the affected projects far too often. Often it takes a slashdot response to get peoples attention.

Regards,

Matthew
Leave a comment:
mtippett replied

24 February 2010, 04:02 PM
Originally posted by dashcloud View Post

Thanks for the Ogg Vorbis audio- I listened to the whole thing, and it was quite good.
Thanks for the slides as well.

NP - If there are any other topics of interest, I am more than happy to submit a paper to conferences when there is community interest.
Leave a comment:
sabriah replied

24 February 2010, 01:02 AM
Originally posted by Michael View Post

It's simply not feasible to always dig to the bottom of every single regression found by myself. There would rarely ever be a new Phoronix article published due to the immense amount of time required. When regressions are found, the community and particularly the project responsible for the regression are more easily able to analyze what happened.

I have seen that you have contacted developers in the case of regressions, and that is a key point. Always report bugs, as otherwise developers don't know they are there. They don't read every site.

Also, I think it is important to make these reports through the main bug report channels. Whether that is the distro's channels or the the affected application's channels should not be an issue as in a perfect world the bug report would percolate to the correct maintainer. However, I am afraid many bug reports stop somewhere in along the line.

Also, with several distro trees nested inside each other some patches may have been solved only in one branch/fork; think of the sequence Debian>Ubuntu>Mint. What happens with a bug report to Ubuntu, will it reach Debian and/or Mint? We can only hope that all distros will further bug reports to the actual source and not make a fix peculiar to their domain.

Thanks for your pdf and ogg!
Leave a comment:
dashcloud replied

23 February 2010, 10:00 PM
Thanks for the Ogg Vorbis audio- I listened to the whole thing, and it was quite good.
Thanks for the slides as well.
Leave a comment:
mtippett replied

22 February 2010, 04:39 PM
Originally posted by bnolsen View Post

Easy stuff for programmers to remember and get their egos out of the way:

- All code sucks
- It's "the" code, not "my" or "your" code.

Only a few programmers really stand out and most of that remainder disqualify themselves because of the ownership issues above. People need to get over themselves in general, especially programmers.

I'd wouldn't be so strong, but yes, in my professional domain, I tend to use inanimate terms around components - it's the kernel component that's broken, not it's your kernel that's broken. It goes a long way in removing ego and personality from the discussion.

In a lot of cases, the industry and the community are wonderful at their component, but have too little awareness of the system that they are part of. Given particularly in the community, you have possibly tens to hundreds of variants floating around.

The issue for me is the automatic dismissal rather than the asking questions or floating assumptions or causes. Automatically saying "the other team must have built it wrong is completely not helpful.
Leave a comment:
bnolsen replied

22 February 2010, 04:17 PM
Easy stuff for programmers to remember and get their egos out of the way:

- All code sucks
- It's "the" code, not "my" or "your" code.

Only a few programmers really stand out and most of that remainder disqualify themselves because of the ownership issues above. People need to get over themselves in general, especially programmers.
Leave a comment:
Michael replied

22 February 2010, 10:56 AM
Originally posted by Jonno View Post

The PTS is great, no arguments there, but unfortunately most articles on phoronix.com stop after comparing, and leaves the contrasting to the comments. And as you have noticed, those often stop at stage 2 or 3. I think it would be a great benefit if more of the phoronix.com articles continued all the way to stage 4.

It's simply not feasible to always dig to the bottom of every single regression found by myself. There would rarely ever be a new Phoronix article published due to the immense amount of time required. When regressions are found, the community and particularly the project responsible for the regression are more easily able to analyze what happened.
Leave a comment:
Jonno replied

22 February 2010, 10:49 AM
Having only looked at the slides (not listened to the audio yet, don't have an hour to kill right now) I must say that this looks great. Hopefully some more awareness of this can get a larger part of the community to stage 4 and 5.

However, I think phoronix.com is as much a part of the problem as it is a part of the solution. The PTS is great, no arguments there, but unfortunately most articles on phoronix.com stop after comparing, and leaves the contrasting to the comments. And as you have noticed, those often stop at stage 2 or 3. I think it would be a great benefit if more of the phoronix.com articles continued all the way to stage 4.

In fact, it is the analysis (stage 4) that makes me prefer lwn.net over phoronx.com, even though the news covered by phoronix.com is closer to what I'm actually interested in...
Leave a comment:

Announcement

The Five Stages of Benchmark Loss

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: