Announcement

Collapse
No announcement yet.

The Five Stages of Benchmark Loss

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Five Stages of Benchmark Loss

    Phoronix: The Five Stages of Benchmark Loss

    This weekend at the Southern California Linux Expo in Los Angeles, Matthew Tippett and I presented a talk entitled Five Stages of Benchmark Loss: PTS and You. In this hour-long talk, we covered Linux benchmarking, what has been learned over the years of benchmarking at Phoronix, the Phoronix Test Suite, and the five stages that users and developers generally go through when they lose out on benchmarking results. For those that were unable to attend this event, here are the slides and recordings.

    http://www.phoronix.com/vr.php?view=14609

  • #2
    Originally posted by phoronix View Post
    Phoronix: The Five Stages of Benchmark LossThe extracted audio of the talk can be found in MP3 and Ogg formats.
    http://www.phoronix.com/vr.php?view=14609
    Kudos and thanks a lot

    Comment


    • #3
      BTW why don't you transcode the video in a smaller file to share it?
      Better low quality than no video at all

      Comment


      • #4
        WTF! This can't be! There's clearly something flawed in your methodology... but on a further analysis, I accept the correctness of your results.

        Comment


        • #5
          Why dont they just use bittorrent to host the videos?

          And all this talk about finding a "winner" and pts still doesnt have a summary graph showing the "winner".

          Comment


          • #6
            To the Phoronix team,

            You have my sincere appreciation for providing the slides and audio in open formats! I have downloaded both and look forward to the video. Please consider providing the video in an unencumbered format (e.g. Theora) as well. Again, thank you very much for supporting open formats and setting a positive example for content providers!

            Comment


            • #7
              Having only looked at the slides (not listened to the audio yet, don't have an hour to kill right now) I must say that this looks great. Hopefully some more awareness of this can get a larger part of the community to stage 4 and 5.

              However, I think phoronix.com is as much a part of the problem as it is a part of the solution. The PTS is great, no arguments there, but unfortunately most articles on phoronix.com stop after comparing, and leaves the contrasting to the comments. And as you have noticed, those often stop at stage 2 or 3. I think it would be a great benefit if more of the phoronix.com articles continued all the way to stage 4.

              In fact, it is the analysis (stage 4) that makes me prefer lwn.net over phoronx.com, even though the news covered by phoronix.com is closer to what I'm actually interested in...

              Comment


              • #8
                Originally posted by Jonno View Post
                The PTS is great, no arguments there, but unfortunately most articles on phoronix.com stop after comparing, and leaves the contrasting to the comments. And as you have noticed, those often stop at stage 2 or 3. I think it would be a great benefit if more of the phoronix.com articles continued all the way to stage 4.
                It's simply not feasible to always dig to the bottom of every single regression found by myself. There would rarely ever be a new Phoronix article published due to the immense amount of time required. When regressions are found, the community and particularly the project responsible for the regression are more easily able to analyze what happened.
                Michael Larabel
                http://www.michaellarabel.com/

                Comment


                • #9
                  Easy stuff for programmers to remember and get their egos out of the way:

                  - All code sucks
                  - It's "the" code, not "my" or "your" code.

                  Only a few programmers really stand out and most of that remainder disqualify themselves because of the ownership issues above. People need to get over themselves in general, especially programmers.

                  Comment


                  • #10
                    Originally posted by bnolsen View Post
                    Easy stuff for programmers to remember and get their egos out of the way:

                    - All code sucks
                    - It's "the" code, not "my" or "your" code.

                    Only a few programmers really stand out and most of that remainder disqualify themselves because of the ownership issues above. People need to get over themselves in general, especially programmers.
                    I'd wouldn't be so strong, but yes, in my professional domain, I tend to use inanimate terms around components - it's the kernel component that's broken, not it's your kernel that's broken. It goes a long way in removing ego and personality from the discussion.

                    In a lot of cases, the industry and the community are wonderful at their component, but have too little awareness of the system that they are part of. Given particularly in the community, you have possibly tens to hundreds of variants floating around.

                    The issue for me is the automatic dismissal rather than the asking questions or floating assumptions or causes. Automatically saying "the other team must have built it wrong is completely not helpful.

                    Comment


                    • #11
                      Thanks for the Ogg Vorbis audio- I listened to the whole thing, and it was quite good.
                      Thanks for the slides as well.

                      Comment


                      • #12
                        Originally posted by Michael View Post
                        It's simply not feasible to always dig to the bottom of every single regression found by myself. There would rarely ever be a new Phoronix article published due to the immense amount of time required. When regressions are found, the community and particularly the project responsible for the regression are more easily able to analyze what happened.
                        I have seen that you have contacted developers in the case of regressions, and that is a key point. Always report bugs, as otherwise developers don't know they are there. They don't read every site.

                        Also, I think it is important to make these reports through the main bug report channels. Whether that is the distro's channels or the the affected application's channels should not be an issue as in a perfect world the bug report would percolate to the correct maintainer. However, I am afraid many bug reports stop somewhere in along the line.

                        Also, with several distro trees nested inside each other some patches may have been solved only in one branch/fork; think of the sequence Debian>Ubuntu>Mint. What happens with a bug report to Ubuntu, will it reach Debian and/or Mint? We can only hope that all distros will further bug reports to the actual source and not make a fix peculiar to their domain.

                        Thanks for your pdf and ogg!

                        Comment


                        • #13
                          Originally posted by dashcloud View Post
                          Thanks for the Ogg Vorbis audio- I listened to the whole thing, and it was quite good.
                          Thanks for the slides as well.
                          NP - If there are any other topics of interest, I am more than happy to submit a paper to conferences when there is community interest.

                          Comment


                          • #14
                            Originally posted by sabriah View Post
                            I have seen that you have contacted developers in the case of regressions, and that is a key point. Always report bugs, as otherwise developers don't know they are there. They don't read every site.

                            Also, I think it is important to make these reports through the main bug report channels. Whether that is the distro's channels or the the affected application's channels should not be an issue as in a perfect world the bug report would percolate to the correct maintainer. However, I am afraid many bug reports stop somewhere in along the line.

                            Also, with several distro trees nested inside each other some patches may have been solved only in one branch/fork; think of the sequence Debian>Ubuntu>Mint. What happens with a bug report to Ubuntu, will it reach Debian and/or Mint? We can only hope that all distros will further bug reports to the actual source and not make a fix peculiar to their domain.

                            Thanks for your pdf and ogg!
                            Okay. So for a particular issue (KVM SQLITE results 100s of times faster than) I did that. Due to the developers not wanting to get to stage 4, it became a pain point.

                            I contacted the three primary projects involved (SQLITE, Ubuntu, KVM, QEMU). KVM was blaming Ubuntu, blaming phoronix, blaming QEMU. SQLITE not interested in a simplistic benchmark.)

                            I raised a launchpad issue as a cover for the work and actively asked questions to piece together what was occuring. The KVM developers who simply didn't believe that they could be at fault actually went and closed the issue as being not Ubuntu, KVM or QEMUs problem.

                            In the end, cooler heads prevailed and a KVM patch was applied to alleviate the situation, but the effort to get bugs filed and communicate with the teams was quite frankly a waste of time and effort. The numbers stood, the benchark stood, and a fix was made. Cost to me was about 30 or so emails, personal attacks and days of wasted effort lodging bugs and arguing details that were consequently closed by holier-than-thou developers. I did try both ways in this case - I asked politely on mailing lists, I raised bugs as requested.

                            The reality of the situation is that if the affected parties aren't willing to get to the analysis stage, then there is virtually no point in filing a bug without a receptive developer.

                            As I mentioned in the talk, there is no reason that a lot of this should be surprising to developer of a project. The tests Phoronix uses are consistent, the tests are trivial to run, but the results are rejected by the affected projects far too often. Often it takes a slashdot response to get peoples attention.

                            Regards,

                            Matthew

                            Comment


                            • #15
                              Does the same sort of thing happen when Windows benchmarks like 3DMark or other suites are run?

                              Comment

                              Working...
                              X