Announcement

Collapse
No announcement yet.

Linux 2.6.24 Through Linux 2.6.33 Benchmarks

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by darkbasic View Post
    I do not agree. What is interesting to see is how the kernel behaves under *REAL* loads, everything else is useless because developing a kernel is a continuous trade-off between different load scenarios. Who cares if the kernel performs badly in an unreal load scenario?
    Give an indication of some real loads that can act as a suitable benchmark. I am genuinely interested.

    The only real, real loads are what people use from day to day. The day to usage in most cases varies across a large set of loads. Mixed use benchmarking is almost impossible to do (can't make everyone happy). Having tests that serve as proxies for pathological cases is actually quite important for people who really need to take first steps in filesystem selection.

    Under real loads, most filesystems perform reasonably well for the real loads that users use, that's a fact. The differences come in with data integrity and failures and the way the file systems respond to pathological scenarios. Put a different way, you characterize the dimensions of a filesystem and then you can make some assumptions of how it will react to your real-world load. My real-world load is vastly different from your real-world load.

    For instance, the Apache benchmark gives you an indication of bandwidth loaded scenarios. The SQLITE test gives you an indication of the sync-bound scenarios. Your real-world scenario is going to be a blend of particular workloads.

    Of course we are assuming that people who have a particular singularly focused scenario can test their scenario. But these scenarios are very broad and by their nature specific to individuals.

    Matthew

    Comment


    • #32
      There is no such a thing as a 'real world benchmark'. That's pure nonsense. But let me put my science hat on (yay, it fits). When a scientist measures the temperature of boiling water under certain conditions and shouts excited "100 degreesLOL!", it doesn't mean that the temperature of your boiling water at home will necessarily be the same. You may be living well above the sea level, for instance. Or you may very well have less than ideal conditions (for sure worse than those in a lab) and use crap water, crap termometers, have impurities in your container, etc. So the scientist is not saying that the temperature of boiling water at yours is always and exactly 100 degress. Nobody would ever complain about the chemistry book not including the boling point of water under a real world scenario (whatever that might be). That's not to say that the lab data isn't useful: if you suddenly found your pasta to boil at 60 or 170 degrees you should have reason to think that there is something wrong with your kitchen.

      Or think about materials testing. How are you going to check that this new cool alloy is up to the job? Perhaps you should build some buildings and force people to live there for 25 years, just to see how it goes under a "real world scenario". Or maybe construct some passenger aircrafts and sell them to British Airways to test them, again under "real loads". Of course not. You design some reproducible test that you can use to compare the properties of different materials against each other, or perhaps against different batches of the same one. And even if these loads do not particularly resemble the intended use of the material, they are extremelly useful to extract conclusions from the results.

      "Argh, this batch of unobtainium sucks!"
      "Boah, Linus really fucked up this release!"

      Comment


      • #33
        Originally posted by mtippett View Post
        The only real, real loads are what people use from day to day. The day to usage in most cases varies across a large set of loads. Mixed use benchmarking is almost impossible to do (can't make everyone happy).
        I agree.

        Originally posted by mtippett View Post
        Give an indication of some real loads that can act as a suitable benchmark. I am genuinely interested.
        Some of the benchmarks you choose are already quite good in my opinion, but the apache benchmark as you did it is not indicative of any real world scenario, is not an *apache* benchmark but a "let's see how the scheduler behaves in a crappy load scenario" benchmark.

        This is misleading, so please do it in a significant manner or, if you think that
        Originally posted by mtippett View Post
        Having tests that serve as proxies for pathological cases is actually quite important
        rename it to avoid misunderstandings.

        Personally, I'd like to see a real apache test

        Also, I'd *really* like to see a benchmark with linux-2.6.33 compiled with gcc and linux-2.6.33 compiled with icc. LMbench's ones are useless numbers for me, but are very impressive and promising for an interesting real world benchmark (context switching is something like 5x faster )

        P.S.
        Keep up your good work!
        ## VGA ##
        AMD: X1950XTX, HD3870, HD5870
        Intel: GMA45, HD3000 (Core i5 2500K)

        Comment


        • #34
          I have no problem with using a synthetic benchmark like that, but it should be described as such. Look at all the articles Phoronix has written, and they all talk about how kernels have poor Apache performance, with no mention that it isn't actually testing Apache.

          If he called it the fsync test and talked about how some kernels have poor sync performance, that would be completely different. But 99% of people who just drive by these articles without looking into the details would get the completely wrong impression, and that should change.

          Comment


          • #35
            postgresql benchmarking resuls

            At first glance on can be forgivin for thinking that a kernel regression caused the change in the pgbench numbers we see here. But that's not really the case. The throughput for transactions per second is limited by the ability of the IO subsystem to commit changes to disk. While pgsql, properly tuned, can gang up a few transactions together into one commit, the throughput tends to be the same as the number of rotations per second of the hard drive underneath the database.

            For a pg database with a single 7200rpm drive, that's about 120 commits per second. Now, if the disk subsystem is lying, then it's possible that when pg issues an fsync command that some part of the the OS / file system / driver layer / hard drive chain lies, and says "sure thing captain" when it hasn't actually committed the change to the hard drive. Assuming the test machine for these benchmarks has a single hard drive spinning in the 7200 to 1000 range, it should be impossible to get more than a hundred or two hundred tps out of the system, as the drives just aren't fast enough to do this.

            As an example, on my laptop, I run pgbench with the write cache on on the internal 5400 rpm hard drive. with a 5400 RPM drive the maximum tps I should expect would be just under a 100 or so. With the write cache turned on, like so:
            sudo hdparm -W1 /dev/sda
            /usr/lib/postgresql/8.3/bin/pgbench -c 5 -t 1000

            I get:
            tps = 327.928589 (excluding connections establishing)


            which is about 3.5 times too high. now, I turn off the write cache:
            sudo hdparm -W0 /dev/sda
            /usr/lib/postgresql/8.3/bin/pgbench -c 5 -t 1000
            tps = 90.460715 (excluding connections establishing)

            which is about right for 5400 rpm.

            Looking at the numbers in the benchmarks here, with a throughput of 158, I'm gonna guess that 2.6.33 is doing the right thing (i.e. not lying about disk commits) and that it's on a 10krpm drive which has a max throughput of 166 tps. I wouldn't be surprised if the benchmark used to run on a 15k drive where max throughput is about 250tps.

            Comment


            • #36
              As is these benchmarks are rather meaningless. For the most part all you've shown is that the kernel configuration defaults have moved more toward a desktop usage pattern and data safety without a UPS rather than a server usage pattern with proper power protection.

              What would actually be interesting is a comparison of all kernels with proper configuration options givens its intended use.

              Test apache and postgre with a kernel configuration designed for a server workload.

              Test desktop apps with a kernel configuration designed for a desktop usage pattern.

              No one who pays any attention to performance sets up a LAMP server with a default kernel configuration.

              Comment


              • #37
                Originally posted by xianthax View Post
                No one who pays any attention to performance sets up a LAMP server with a default kernel configuration.
                Out of interest, where is the recommended server configuration instructions. You imply that every LAMP server administrator is going to have more or less the same config. I don't believe there is a secret brotherhood that ensures through verbal tradition every administrator has the same information. Hence, there has to be a few online references as to how to do it right.

                I am sure a default vs best-of-class server config vs best-of-class desktop config would be interesting.

                Are you willing to stand forward and declare yourself as the configurator of the best-in-class server config? Is there any reader interested in the best-of-class desktop config?

                Nothing is sacred, even define and contribute more correct server oriented tests if you have the interested..

                Matthew

                PS, I have had the same discussion with multi national corporations about more or less the same thing. I don't think they made a document in te end either.

                Comment


                • #38
                  Originally posted by Sxooter View Post
                  At first glance on can be forgivin for thinking that a kernel regression caused the change in the pgbench numbers we see here. But that's not really the case.

                  ...

                  I'm gonna guess that 2.6.33 is doing the right thing (i.e. not lying about disk commits) and that it's on a 10krpm drive which has a max throughput of 166 tps.

                  ...
                  Thanks for the analysis. I would like to make sure the definition of regression is clear.
                  An unexpected or unplanned change in behaviour
                  Now in reality the expectation will be completely individual context dependent. One person's what the.. moment will be another's that's logical moment.

                  Comment


                  • #39
                    Originally posted by mtippett View Post
                    Thanks for the analysis. I would like to make sure the definition of regression is clear.
                    An unexpected or unplanned change in behaviour
                    Now in reality the expectation will be completely individual context dependent. One person's what the.. moment will be another's that's logical moment.
                    Exactly.

                    To me regression implies that something's gotten worse, not better. In this case, the real regression was in the kernel that suddenly seemed much faster but was in fact playing lose with data integrity. From a DBA perspective, a kernel that suddenly stops paying attention to fsync or barrier calls is a regression because it can make you lose data you thought was safely stored away.

                    Comment


                    • #40
                      Originally posted by Sxooter View Post
                      Exactly.

                      To me regression implies that something's gotten worse, not better. In this case, the real regression was in the kernel that suddenly seemed much faster but was in fact playing lose with data integrity. From a DBA perspective, a kernel that suddenly stops paying attention to fsync or barrier calls is a regression because it can make you lose data you thought was safely stored away.
                      Okay. A dictionary definition of a regression is
                      A change from a more developed to less developed state
                      As this applies to software engineering, it doesn't quite apply since within software, you don't have a monotonic improvement as a system is developed and maintained.

                      This coupled with a general lack of test and orthogonal constraints (performance/integrity, etc), you can't just make a simple if it's worse, it's a regression.

                      As a result of this ambiguity, any change that is unexpected will usually foretell a lack of planning/awareness of impact. The KVM guest faster than host SQLITE test was a great example of this. The performance increase (argued by some as not a regression) showed a decrease in integrity (obviously a regression for some) showed that there was a change that was incomplete (the actual regression).

                      So rather than always get into an argument - it's the question of if the change is expected. ie: someone has a git checkin that explitly says "trade off performance for data integrity. Usually, however, the likelihood of a checkin like this going in without a week long flamewar due *only* to the fact there is a tradeoff explicitly highlighted in the checkin.

                      Ergo, there are *LOTS* of regressions, and regressions can be great or devastating depending on your point of view.

                      Matthew

                      Comment


                      • #41
                        Originally posted by mtippett View Post
                        Out of interest, where is the recommended server configuration instructions. You imply that every LAMP server administrator is going to have more or less the same config. I don't believe there is a secret brotherhood that ensures through verbal tradition every administrator has the same information. Hence, there has to be a few online references as to how to do it right.

                        I am sure a default vs best-of-class server config vs best-of-class desktop config would be interesting.

                        Are you willing to stand forward and declare yourself as the configurator of the best-in-class server config? Is there any reader interested in the best-of-class desktop config?

                        Nothing is sacred, even define and contribute more correct server oriented tests if you have the interested..

                        Matthew

                        PS, I have had the same discussion with multi national corporations about more or less the same thing. I don't think they made a document in te end either.
                        No i'm certainly not an expert on kernel configuration nor do i claim to be. There also is no "best in class" configuration, the ideal configuration settings will be dependent on usage pattern. In fact this is an area where the test suite could likely provide some serious benefit, running a spectrum of server/desktop tests on a kernel and varying various kernel configuration options could be very valuable.

                        That all being said there certainly is a documented starting place, the kernel configuration settings for common server distros. The test suite could prove very valuable to improve their default configuration options.

                        The point of my post is that this article provides no actionable information. It isn't going to make anyone choose a particular kernel to gain a performance increase as most of the performance difference come down to kernel defaults rather than increased or decreased efficiency in code paths.

                        Comment


                        • #42
                          I used to be an avid reader of Phoronix benchmarks... I found very interesting. My current opinion is that they're not very serious/reliable to say the least.

                          Anyway, regarding those "regressions" in .33 with PostgreSQL see:

                          http://www.mail-archive.com/pgsql-pe.../msg34841.html

                          "The pgbench TPS figure Phoronix has been reporting has always been a fictitious one resulting from unsafe write caching. With 2.6.32 released with ext4 defaulting to proper behavior on fsync"

                          Comment


                          • #43
                            Originally posted by rekiah View Post
                            I used to be an avid reader of Phoronix benchmarks... I found very interesting. My current opinion is that they're not very serious/reliable to say the least.

                            Anyway, regarding those "regressions" in .33 with PostgreSQL see:

                            http://www.mail-archive.com/pgsql-pe.../msg34841.html

                            "The pgbench TPS figure Phoronix has been reporting has always been a fictitious one resulting from unsafe write caching. With 2.6.32 released with ext4 defaulting to proper behavior on fsync"
                            Again, the test is for measuring a system. pgbench is just something that exercises some paths. The regression is simply a change. It has become generally known that the cost of fsync implementation is critical for a lot of the database tests.

                            The TPS figure is the TPS figure for databases running on those systems. As Greg mentions in the first part of his thread. Some interesting benchmark news today suggests a version of ext4 that might actually work for databases is showing up in early packaged distributions: <phoronix reference removed>

                            The PTS benchmark are being tracked by them and it appears that although it isn't widely communicated, most filesystems that have high performance numbers for these tests are misconfigured. That is the critical part of the value that the PTS tests are adding. Sxooter's post in this thread re-enforces this idea too.

                            Perhaps statements should be made asserting that anything higher than 100 to 400 in either the SQLITE or pgbench testing indicates that the system is not suitable for database use. That would be interesting. I assume that the first reference to that would get half a dozen highly colorful threads started by those that don't understand that fast isn't really fast.

                            I don't think that Michael has ever made assertions about *right* or *wrong* implementation in the same way that Greg has. Michael is merely reporting the delta is present. Each reader will need to dig into the relevance of the change.

                            Comment


                            • #44
                              Originally posted by xianthax View Post
                              No i'm certainly not an expert on kernel configuration nor do i claim to be. There also is no "best in class" configuration, the ideal configuration settings will be dependent on usage pattern.

                              ...
                              The underlying issue is that *no-one* stands forward willing to state what is and what is not tuned. If *no-one* stands forward, and *no-one* documents it, then we get left with the desktop-configured defaults. The only groups that implied in coming forwards and contributing to the configuration are the desktop crowd. (Although I would expect the default kernel config is more server oriented, but don't have a best-practice to measure that against).

                              If people want to come forward and be involved in a kernel configuration bake-off, please come forward.

                              Comment


                              • #45
                                Originally posted by mtippett View Post
                                ... Michael is merely reporting the delta is present. Each reader will need to dig into the relevance of the change.
                                +1
                                I agree completely with your post. Deltas are important since they sometimes reveal bugs (and not always with the new code but actually with the old code).


                                Originally posted by mtippett View Post
                                ....(Although I would expect the default kernel config is more server oriented, but don't have a best-practice to measure that against)....
                                I remember that I've read some time ago that in fact the kernel comes by default optimized for _desktop_ usage. (well actually _optimized_ is a strong word). And if you think about it, it makes sense: server guys will want to tune their system base on their _load_ patterns since I don't think there is a configuration that produces _best_ results for all load patterns.

                                Comment

                                Working...
                                X