Announcement

Collapse
No announcement yet.

Linux 2.6.24 Through Linux 2.6.33 Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by kraftman View Post
    Usually those tests are meaningless. If there's intentional change in the kernel it's described here like regression. In example some file system has different mode set as default in newer kernel and this can make big impact on some benchmarks - PostgreSQL etc. Apache benchmark is meaningless, because it's not done properly.
    A Regression is a unexpected change in behavior. If the kernel developers make a change in one area, and they are not expecting the behavior change in other areas those areas have regressed.

    Remember that file system tuning (turning features on and off) is a specialist skill. Most people get very concerned with making changes that they may believe are uninformed when they may be risking their data. The maintainers of the filesystems and the distros that package them are the ones that control the behavior.

    I'd like you to expand on your "not done properly" if you could.

    Comment


    • #12
      Originally posted by Xheyther View Post
      I can't believe such big regressions like apache where left untouched and unresolved during so much release. I don't understand either why the PostgreSQL performance regression passed through the rc phase... Either you just prove to the worl the kernel development model is flawed, either their is a flaw in the tests you ran. In both case, some analysis and explanation would have been nice along the results.
      It depends on what people are watching for. As mentioned on this thread, all that has been shown is that the ab benchmark as currently in place is extremely sensitive (mind you in this case in a good way) to the changes going on in the kernel.

      This doesn't show anything is flawed. In any system all metrics don't go up monotonically. You make improvements in one area that degrade an another. You just want the "average" experience to be on an upward trend.

      Remember that Linux covers everything from embedded systems through to big-iron servers. Being a generalist is *really* hard to do.

      Comment


      • #13
        Thanks for these tests!

        I think it is extremely valuable to have these tests in public.

        It may cause hiccups in some camps, but, that is what they're good for, stopping hiccups!

        Comment


        • #14
          Content

          I appreciate the great job Phoronix does on reporting news in the Linux community, but I find that the benchmarking articles could be much better. I don't need someone to show me a graph and then list the statistics in the text below the graph. The graph shows the statistics already. These articles fail to draw any real conclusions about the results. Rather than saying "these numbers went down, these numbers went up, and these numbers stayed the same," Phoronix should look into *why* changes occur. I'm not saying that you have to research every regression you find, but at least put a little effort into finding a couple of real interesting development notes to provide some solid information along with the figures.

          Comment


          • #15
            Originally posted by maccam94 View Post
            I appreciate the great job Phoronix does on reporting news in the Linux community, but I find that the benchmarking articles could be much better. I don't need someone to show me a graph and then list the statistics in the text below the graph. The graph shows the statistics already. These articles fail to draw any real conclusions about the results. Rather than saying "these numbers went down, these numbers went up, and these numbers stayed the same," Phoronix should look into *why* changes occur. I'm not saying that you have to research every regression you find, but at least put a little effort into finding a couple of real interesting development notes to provide some solid information along with the figures.
            Agreed. If it takes too much time, perhaps someone else out there could chip in - you make the graphs, raise some questions, and someone else, maybe who works on these software projects, can explain.

            Comment


            • #16
              Originally posted by atmartens View Post
              Agreed. If it takes too much time, perhaps someone else out there could chip in - you make the graphs, raise some questions, and someone else, maybe who works on these software projects, can explain.
              Benchmarking loss (which is really what we are talking about here) stomps on egos pretty hard. Most times that I have reached out pro-actively, the response from the developers is usually quite painful.

              Realistically, reaching out to developers individually does usually get better traction. That involves going around the project's due process of posting to a list. That raises the bar to getting to the bottom of the problem even further.

              I guess that discussing the likely impacted area is possibly the next increment in analysis, but even then you will still have the people turning around and saying "you have no idea what you are talking about, stop spreading FUD".

              Not an easy problem to fix, but *very* costly to make valuable.

              Comment


              • #17
                Originally posted by maccam94 View Post
                I appreciate the great job Phoronix does on reporting news in the Linux community, but I find that the benchmarking articles could be much better. I don't need someone to show me a graph and then list the statistics in the text below the graph. The graph shows the statistics already. These articles fail to draw any real conclusions about the results. Rather than saying "these numbers went down, these numbers went up, and these numbers stayed the same," Phoronix should look into *why* changes occur. I'm not saying that you have to research every regression you find, but at least put a little effort into finding a couple of real interesting development notes to provide some solid information along with the figures.
                That's the problem with regressions. If the developers knew it was going to cause a performance delta, then you it shouldn't be a surprise (performance, as expected went down due to this). The issue is that most of the time a performance regression (good or bad) is a confluence of other issues which don't always make sense to even the developer working on the component themselves.

                In a testing and benchmarking poor environment (most Open Source projects), getting a hypothesis raised and validated is almost impossible. What makes it even worse is that a lot of people have huge personal investment in a project, and telling them that they have broken it or are slow cuts straight through the ego.

                Comment


                • #18
                  Originally posted by maccam94 View Post
                  I don't need someone to show me a graph and then list the statistics in the text below the graph. The graph shows the statistics already.
                  This is standard practice.

                  All scientific journals require this - tell the readers in words what the graphs and tables say anyway.

                  The benefit is also that the results become searchable through search engines.


                  .

                  Comment


                  • #19
                    Originally posted by mtippett View Post
                    Benchmarking loss (which is really what we are talking about here) stomps on egos pretty hard. Most times that I have reached out pro-actively, the response from the developers is usually quite painful.

                    Realistically, reaching out to developers individually does usually get better traction. That involves going around the project's due process of posting to a list. That raises the bar to getting to the bottom of the problem even further.

                    I guess that discussing the likely impacted area is possibly the next increment in analysis, but even then you will still have the people turning around and saying "you have no idea what you are talking about, stop spreading FUD".

                    Not an easy problem to fix, but *very* costly to make valuable.
                    A couple of options you could take are:
                    Correlate tests to subsystems they stress, then do a quick search through their changelogs/bug reports.
                    Do not focus on the fact the numbers changed. Ask why the results of a benchmark might have changed with default settings. Developers might be interested to explain how they found a clever new way to boost performance, or to explain that performance has decreased to increase safety.
                    Invite developers to comment on the results before posting them.

                    Yes, these options would require some extra work, but I think your readers (and the development community) would really appreciate it.

                    Comment


                    • #20
                      Originally posted by maccam94 View Post
                      A couple of options you could take are:
                      Correlate tests to subsystems they stress, then do a quick search through their changelogs/bug reports.
                      Do not focus on the fact the numbers changed. Ask why the results of a benchmark might have changed with default settings. Developers might be interested to explain how they found a clever new way to boost performance, or to explain that performance has decreased to increase safety.
                      Invite developers to comment on the results before posting them.

                      Yes, these options would require some extra work, but I think your readers (and the development community) would really appreciate it.
                      I'll let Michael make comments on the reporting.

                      My view is that the impact of different subsystems is heavily dependent on the interactions between different parts of the system. In a lot of cases, the changelogs may indicate, but it would usually take domain expertise in that subsystem to be able to correlate the two.

                      I agree that more information would be useful, but I am not sure if it's going to add too much for a detailed analysis on each regression. I'd expect the collective intelligence of the forums would probably have more luck crowd-sourcing what is the trigger than Michael or myself digging deep.

                      The numbers are there, the tests are there and the kernels are there. If anyone is willing to dig deep to understand the difference, I would be very interested to know how far they get. There is no barrier for anyone to reproduce the results and Fight the Good Fight to understand what is going on.

                      Any takers?

                      Matthew

                      Comment

                      Working...
                      X