Announcement

Collapse
No announcement yet.

Linux 2.6.24 Through Linux 2.6.33 Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    if only Linus used Phorx Test

    I read posts from kernel devs. that state they want test results to see improvements, failures, etc. Well here is real data.

    Be nice when you dont' have to recompile in order to change most the kernel parameters. #cpus, hi-mem, timer freq., dynamic ticks, cpuarch.

    Check out amd64. I regress

    Comment


    • #22
      Originally posted by mtippett View Post
      The numbers are there, the tests are there and the kernels are there. If anyone is willing to dig deep to understand the difference, I would be very interested to know how far they get.
      Do what you are doing, publish the numbers.

      One thing though. If it is only one app which sees a regression it might be that that particular app is doing something wrong. If you have a number of apps which regress on the same kernel, then it may well be a kernel regression.

      As benchmark time is limited, I would use as many PTS benchmarks as possible, but don't run each for a long time. Instead of 5 minute runs for five applications, one could use 30 second runs for 30 applications; both would be 900 seconds of benchmark.

      Is this feasible?

      Comment


      • #23
        Originally posted by mtippett View Post
        Skip to the next response to that thread..

        I turned on apache, and played with ab a bit, and yup, ab is a hog, so
        any fairness hurts it a badly. Ergo, running ab on the same box as
        apache suffers with CFS when NEW_FAIR_SLEEPERS are turned on. Issuing
        ab bandwidth to match it's 1:N pig nature brings throughput right back.




        Remember that you can't test anything, and testing in the obvious path will usually result in flat lines - since they represent the 95% path.

        As indicated above, what has been identified is that in some scenarios CFS completely tanks. The ab is just a tool to make this visible.
        In this scenario with fair sleepers enabled, yes. However, this scenario is one out of the reality until someone runs apache on a single machine which is not recommended.* I think it's something natural scheduler is tuned to perform well in real situations. So, what's the point of this benchmark?

        As per usual, if there is any benchmark which you believes provides a suitable equivalent scenario but is more "correct", please tell us.
        Maybe replace "correct" by more meaningful. The problem is I'm not sure what could be equivalent scenario to this benchmark and if there's no such scenario this benchmark means: we've got different results in Apache benchmark running on the same machine which isn't recommended. Btw. about what scenario you were talking about? Such *?

        Comment


        • #24
          Originally posted by mtippett View Post
          A Regression is a unexpected change in behavior. If the kernel developers make a change in one area, and they are not expecting the behavior change in other areas those areas have regressed.
          If developer decided to change default file system mode to some other it's not a regression, because it is expected change in the file system behavior (it is also known it will affect some benchmarks). Michael isn't dev is he?

          I'd like you to expand on your "not done properly" if you could.
          Recommended way is to run ab on a different machine, so that's why I consider it wasn't done properly or this benchmark is strange in my opinion if you like.

          Comment


          • #25
            Originally posted by sabriah View Post
            This is standard practice.

            All scientific journals require this - tell the readers in words what the graphs and tables say anyway.

            The benefit is also that the results become searchable through search engines.


            .
            No they don't.

            Scientific journals require authors to describe in word what figures and tables show AND to draw from those numbers a valuable conclusion (something that isn't done here, obviously). If you don't you get your paper rejected.

            Comment


            • #26
              Originally posted by Xheyther View Post
              No they don't.

              Scientific journals require authors to describe in word what figures and tables show AND to draw from those numbers a valuable conclusion (something that isn't done here, obviously). If you don't you get your paper rejected.
              I agree with what you say about the AND.

              BUT, and the but is big, here we talk about Phoronix role as a whistleblower. They didn't write the code, and, debugging someone else's code is a nightmare, even for Freddy on Elm Street.

              I never expect them to identify the pivotal change in the code. Heck, even deciding which of several possible layers (eg App or Kernel) can be worse than difficult.

              However, I do think the ones to draw the valuable conclusions you mention from the numbers presented at Phoronix should be the developers. Who else can interpret them with a comparatively minimal effort, and solve them?

              Showing the world system based regressions is one of several important ways to catch bugs and I applaud Phoronix for doing this task.

              I also realize that their use of default settings is a pragmatic choice, not suitable to all practices. But, tweaked settings rapidly enter the inescapable permutation hell; in how many ways can you fine tune web servers and databases?! Which is the least silly setting? Well, the default, because that is the one everyone has access to.


              .

              Comment


              • #27
                Originally posted by kraftman View Post
                If developer decided to change default file system mode to some other it's not a regression, because it is expected change in the file system behavior (it is also known it will affect some benchmarks). Michael isn't dev is he?
                It's a game of whack-a-mole. You make a change with an expected mole to be whacked. Once the change is made, three unexpected moles pop up.

                Industry metrics formal a formal (testing, QA, etc) environment with

                Recommended way is to run ab on a different machine, so that's why I consider it wasn't done properly or this benchmark is strange in my opinion if you like.
                For determining the expected performance of apache, yes I agree that you should have ab and the server on a different machine. But remember that we are not testing the apache installation. The component under test is the kernel in this instance, or at the very least different hardware.

                What we are showing is that there is a synthetic load that is strongly affected by the kernel changes. If we called it "pig-test" the results would still be the same.

                Comment


                • #28
                  Originally posted by mtippett View Post
                  But remember that we are not testing the apache installation. The component under test is the kernel in this instance, or at the very least different hardware.
                  Right. This makes some things clear

                  Comment


                  • #29
                    Originally posted by mtippett View Post
                    I'll let Michael make comments on the reporting.

                    My view is that the impact of different subsystems is heavily dependent on the interactions between different parts of the system. In a lot of cases, the changelogs may indicate, but it would usually take domain expertise in that subsystem to be able to correlate the two.
                    Here's a good example

                    So Phoronix pointed out we have a KMS/DRI2 3D regression vs UMS/DRI1, two things sprang to mind wrt to places we lost some of this so I did a quick benchmark on my laptop. Thinkpad T60P, rv530 FireGL V5200, Intel Core Duo T2500 CPU 2Ghz, 1600x1200 LCD F-12 mostly. So I ran openarena with Eric…


                    Dave, a veritable graphics guru had to ponder and run further benchmarks. And even then he still has concerns about what and where the real tradeoffs will be. Understanding the reason for a regression is absolutely a specialty. Making sure the tests allow for easy analysis is probably the primary area that we can add value.

                    All in all, what a good regression benchmark needs to have is sensitivity to different areas of the system under test. A targeted benchmark for making a purchase decision is a whole different ball game. I am sure Michael is open to targeting some runs to particular areas, if they are of general interest.

                    Unfortunately, a choosing your kernel and filesystem for peak web server performance isn't really what the general populace is interested in.

                    Comment


                    • #30
                      Originally posted by mtippett View Post
                      But remember that we are not testing the apache installation.
                      [...]
                      What we are showing is that there is a synthetic load that is strongly affected by the kernel changes. If we called it "pig-test" the results would still be the same.
                      I do not agree. What is interesting to see is how the kernel behaves under *REAL* loads, everything else is useless because developing a kernel is a continuous trade-off between different load scenarios. Who cares if the kernel performs badly in an unreal load scenario?
                      ## VGA ##
                      AMD: X1950XTX, HD3870, HD5870
                      Intel: GMA45, HD3000 (Core i5 2500K)

                      Comment

                      Working...
                      X