Announcement

Collapse
No announcement yet.

Linux 2.6.24 Through Linux 2.6.33 Benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by darkbasic View Post
    I do not agree. What is interesting to see is how the kernel behaves under *REAL* loads, everything else is useless because developing a kernel is a continuous trade-off between different load scenarios. Who cares if the kernel performs badly in an unreal load scenario?
    Give an indication of some real loads that can act as a suitable benchmark. I am genuinely interested.

    The only real, real loads are what people use from day to day. The day to usage in most cases varies across a large set of loads. Mixed use benchmarking is almost impossible to do (can't make everyone happy). Having tests that serve as proxies for pathological cases is actually quite important for people who really need to take first steps in filesystem selection.

    Under real loads, most filesystems perform reasonably well for the real loads that users use, that's a fact. The differences come in with data integrity and failures and the way the file systems respond to pathological scenarios. Put a different way, you characterize the dimensions of a filesystem and then you can make some assumptions of how it will react to your real-world load. My real-world load is vastly different from your real-world load.

    For instance, the Apache benchmark gives you an indication of bandwidth loaded scenarios. The SQLITE test gives you an indication of the sync-bound scenarios. Your real-world scenario is going to be a blend of particular workloads.

    Of course we are assuming that people who have a particular singularly focused scenario can test their scenario. But these scenarios are very broad and by their nature specific to individuals.

    Matthew

    Comment


    • #32
      There is no such a thing as a 'real world benchmark'. That's pure nonsense. But let me put my science hat on (yay, it fits). When a scientist measures the temperature of boiling water under certain conditions and shouts excited "100 degreesLOL!", it doesn't mean that the temperature of your boiling water at home will necessarily be the same. You may be living well above the sea level, for instance. Or you may very well have less than ideal conditions (for sure worse than those in a lab) and use crap water, crap termometers, have impurities in your container, etc. So the scientist is not saying that the temperature of boiling water at yours is always and exactly 100 degress. Nobody would ever complain about the chemistry book not including the boling point of water under a real world scenario (whatever that might be). That's not to say that the lab data isn't useful: if you suddenly found your pasta to boil at 60 or 170 degrees you should have reason to think that there is something wrong with your kitchen.

      Or think about materials testing. How are you going to check that this new cool alloy is up to the job? Perhaps you should build some buildings and force people to live there for 25 years, just to see how it goes under a "real world scenario". Or maybe construct some passenger aircrafts and sell them to British Airways to test them, again under "real loads". Of course not. You design some reproducible test that you can use to compare the properties of different materials against each other, or perhaps against different batches of the same one. And even if these loads do not particularly resemble the intended use of the material, they are extremelly useful to extract conclusions from the results.

      "Argh, this batch of unobtainium sucks!"
      "Boah, Linus really fucked up this release!"

      Comment


      • #33
        Originally posted by mtippett View Post
        The only real, real loads are what people use from day to day. The day to usage in most cases varies across a large set of loads. Mixed use benchmarking is almost impossible to do (can't make everyone happy).
        I agree.

        Originally posted by mtippett View Post
        Give an indication of some real loads that can act as a suitable benchmark. I am genuinely interested.
        Some of the benchmarks you choose are already quite good in my opinion, but the apache benchmark as you did it is not indicative of any real world scenario, is not an *apache* benchmark but a "let's see how the scheduler behaves in a crappy load scenario" benchmark.

        This is misleading, so please do it in a significant manner or, if you think that
        Originally posted by mtippett View Post
        Having tests that serve as proxies for pathological cases is actually quite important
        rename it to avoid misunderstandings.

        Personally, I'd like to see a real apache test

        Also, I'd *really* like to see a benchmark with linux-2.6.33 compiled with gcc and linux-2.6.33 compiled with icc. LMbench's ones are useless numbers for me, but are very impressive and promising for an interesting real world benchmark (context switching is something like 5x faster )

        P.S.
        Keep up your good work!
        ## VGA ##
        AMD: X1950XTX, HD3870, HD5870
        Intel: GMA45, HD3000 (Core i5 2500K)

        Comment


        • #34
          I have no problem with using a synthetic benchmark like that, but it should be described as such. Look at all the articles Phoronix has written, and they all talk about how kernels have poor Apache performance, with no mention that it isn't actually testing Apache.

          If he called it the fsync test and talked about how some kernels have poor sync performance, that would be completely different. But 99% of people who just drive by these articles without looking into the details would get the completely wrong impression, and that should change.

          Comment


          • #35
            postgresql benchmarking resuls

            At first glance on can be forgivin for thinking that a kernel regression caused the change in the pgbench numbers we see here. But that's not really the case. The throughput for transactions per second is limited by the ability of the IO subsystem to commit changes to disk. While pgsql, properly tuned, can gang up a few transactions together into one commit, the throughput tends to be the same as the number of rotations per second of the hard drive underneath the database.

            For a pg database with a single 7200rpm drive, that's about 120 commits per second. Now, if the disk subsystem is lying, then it's possible that when pg issues an fsync command that some part of the the OS / file system / driver layer / hard drive chain lies, and says "sure thing captain" when it hasn't actually committed the change to the hard drive. Assuming the test machine for these benchmarks has a single hard drive spinning in the 7200 to 1000 range, it should be impossible to get more than a hundred or two hundred tps out of the system, as the drives just aren't fast enough to do this.

            As an example, on my laptop, I run pgbench with the write cache on on the internal 5400 rpm hard drive. with a 5400 RPM drive the maximum tps I should expect would be just under a 100 or so. With the write cache turned on, like so:
            sudo hdparm -W1 /dev/sda
            /usr/lib/postgresql/8.3/bin/pgbench -c 5 -t 1000

            I get:
            tps = 327.928589 (excluding connections establishing)


            which is about 3.5 times too high. now, I turn off the write cache:
            sudo hdparm -W0 /dev/sda
            /usr/lib/postgresql/8.3/bin/pgbench -c 5 -t 1000
            tps = 90.460715 (excluding connections establishing)

            which is about right for 5400 rpm.

            Looking at the numbers in the benchmarks here, with a throughput of 158, I'm gonna guess that 2.6.33 is doing the right thing (i.e. not lying about disk commits) and that it's on a 10krpm drive which has a max throughput of 166 tps. I wouldn't be surprised if the benchmark used to run on a 15k drive where max throughput is about 250tps.

            Comment


            • #36
              As is these benchmarks are rather meaningless. For the most part all you've shown is that the kernel configuration defaults have moved more toward a desktop usage pattern and data safety without a UPS rather than a server usage pattern with proper power protection.

              What would actually be interesting is a comparison of all kernels with proper configuration options givens its intended use.

              Test apache and postgre with a kernel configuration designed for a server workload.

              Test desktop apps with a kernel configuration designed for a desktop usage pattern.

              No one who pays any attention to performance sets up a LAMP server with a default kernel configuration.

              Comment


              • #37
                Originally posted by xianthax View Post
                No one who pays any attention to performance sets up a LAMP server with a default kernel configuration.
                Out of interest, where is the recommended server configuration instructions. You imply that every LAMP server administrator is going to have more or less the same config. I don't believe there is a secret brotherhood that ensures through verbal tradition every administrator has the same information. Hence, there has to be a few online references as to how to do it right.

                I am sure a default vs best-of-class server config vs best-of-class desktop config would be interesting.

                Are you willing to stand forward and declare yourself as the configurator of the best-in-class server config? Is there any reader interested in the best-of-class desktop config?

                Nothing is sacred, even define and contribute more correct server oriented tests if you have the interested..

                Matthew

                PS, I have had the same discussion with multi national corporations about more or less the same thing. I don't think they made a document in te end either.

                Comment


                • #38
                  Originally posted by Sxooter View Post
                  At first glance on can be forgivin for thinking that a kernel regression caused the change in the pgbench numbers we see here. But that's not really the case.

                  ...

                  I'm gonna guess that 2.6.33 is doing the right thing (i.e. not lying about disk commits) and that it's on a 10krpm drive which has a max throughput of 166 tps.

                  ...
                  Thanks for the analysis. I would like to make sure the definition of regression is clear.
                  An unexpected or unplanned change in behaviour

                  Now in reality the expectation will be completely individual context dependent. One person's what the.. moment will be another's that's logical moment.

                  Comment


                  • #39
                    Originally posted by mtippett View Post
                    Thanks for the analysis. I would like to make sure the definition of regression is clear.
                    An unexpected or unplanned change in behaviour

                    Now in reality the expectation will be completely individual context dependent. One person's what the.. moment will be another's that's logical moment.
                    Exactly.

                    To me regression implies that something's gotten worse, not better. In this case, the real regression was in the kernel that suddenly seemed much faster but was in fact playing lose with data integrity. From a DBA perspective, a kernel that suddenly stops paying attention to fsync or barrier calls is a regression because it can make you lose data you thought was safely stored away.

                    Comment


                    • #40
                      Originally posted by Sxooter View Post
                      Exactly.

                      To me regression implies that something's gotten worse, not better. In this case, the real regression was in the kernel that suddenly seemed much faster but was in fact playing lose with data integrity. From a DBA perspective, a kernel that suddenly stops paying attention to fsync or barrier calls is a regression because it can make you lose data you thought was safely stored away.
                      Okay. A dictionary definition of a regression is
                      A change from a more developed to less developed state

                      As this applies to software engineering, it doesn't quite apply since within software, you don't have a monotonic improvement as a system is developed and maintained.

                      This coupled with a general lack of test and orthogonal constraints (performance/integrity, etc), you can't just make a simple if it's worse, it's a regression.

                      As a result of this ambiguity, any change that is unexpected will usually foretell a lack of planning/awareness of impact. The KVM guest faster than host SQLITE test was a great example of this. The performance increase (argued by some as not a regression) showed a decrease in integrity (obviously a regression for some) showed that there was a change that was incomplete (the actual regression).

                      So rather than always get into an argument - it's the question of if the change is expected. ie: someone has a git checkin that explitly says "trade off performance for data integrity. Usually, however, the likelihood of a checkin like this going in without a week long flamewar due *only* to the fact there is a tradeoff explicitly highlighted in the checkin.

                      Ergo, there are *LOTS* of regressions, and regressions can be great or devastating depending on your point of view.

                      Matthew

                      Comment

                      Working...
                      X