No announcement yet.

The Huge Disaster Within The Linux 2.6.35 Kernel

  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Why do Mike has to send this info to instead of redirecting kernel developers to this page?

    This is 2010, and we have web pages for something. It's not 1985 when e-mail and newsgroups were THE ONLY PRACTICAL way to communicate.


    • #92
      Originally posted by deanjo View Post
      The fact that it is lightly understood is very disturbing. Pretty much every development project that I have worked with however has had a regression tracking setup so I'm not sure if it's so much "the industry" or more just the case in a open development model.
      I have lots of personal thoughts on this, I won't go into it here.

      It depends on what people term as regression tracking. I have seen a lot of incantations from we test now and then to we have a tool. Regression management in my mind has the following aspects.

      o A way of detecting the regression
      o A way determining the regression cause (down to the code change)
      o A policy about how to deal with the regressions

      I've seen people get some way, but few get fully there.

      And some of us appreciate that. It seems however, at least judging from a lot of the comments in this thread, if something bad is mentioned against the "church" all of a sudden you become public enemy #1 in their eyes.
      All part of the ecosystem. That's why I wrote the the presentation for SCALE8x. As long as there are sane heads amongst the crowd then I am fine for putting myself out there for the attacks.

      Have you guys approached the larger projects out there in setting up any facility like this? It seems to me if intel/amd can donate hardware for build services, ftp mirrors, etc for distro's they surely can offer some boxes to setup such a facility for the kernel devs to use.
      Yes, we are in discussion with a few projects. A lot of it comes down to how the project can work with us and if they perceive the need.


      • #93
        I read most of the posts now on this, am sorry Michael I do enjoy your site and I do agree that its your job to report about these problems but does it really have to be with a title like that ?

        I am not saying you should not report it, but report it with a more truthefull title like "Performance Regression in pre-RC Linux 2.6.35 Kernel"

        sorry but really disappointed in Phoronix by this form of hype journalism


        • #94
          Originally posted by Michael View Post
          That though wouldn't go towards addressing the fundamental problem that this article is about: how such a glaringly severe regression can be pulled into the tree in the first place and then live there for days. Improving the status quo is what this article is intended to be about more than this bug per se.
          I'm seeing links to info that it's a "bug" in udev, or that udev was relying on something that wasn't stable. Anyways, most of the LKML discussion goes way way over my head. I think a bit care in testing to ensure that the system is truly idle before running the tests would go a long way. A simple top and iotop should have shown if something was using way to much cpu(udev), or if something was making a lot of IO requests. If nothing was using CPU a disk benchmark and a FS benchmark would have helped pinpoint the problem. Some of these tests, pbzip2 for example, can be very cpu and disk intensive at the same time.

          As for how something like this gets in mainline, Linus doesn't/shouldn't need know about how all software interacts with the kernel, most of these merges i thought were done mostly dry, and that the -git and -RC phases were to find and handle regressions like this. These used to be handled in X.Y where Y was odd. Anyone using -git or -RC on a production system is asking for this stuff to bite them.

          As for the tone of the article, a link to a bugreport/message on LKML, with a link back to phoronix, to show the results of the regression. Also I thought the test suite could do an automatic bisect and rerun the tests between any 2 commits. it would seem that it would be nice, if it could be told "start here, run these tests, end there, step N commit(s) at a time". Although that may tie it to much to git.


          • #95
            To those that are saying it's due to a udev issue, that is not for certain yet. Linus is looking into this regression now.
            Michael Larabel


            • #96
              Yep. This is a true disasticle alright.


              • #97
                It would be cool to monitor memory usage also, ever since 2.6.33 my memory usage increased by 300% compared to 2.6.22 I can't bisect it either since early builds of 2.6.33-rc1 panic on me .


                • #98
                  Originally posted by s4e8 View Post
                  After reading that conversation, it sounds like this issue is being addressed.


                  • #99
                    Just a few observations...
                    • The gist of the information in the article was relevant. A large kernel performance regression is a notable issue to be dealt with.
                    • The kernels in question were pre RC kernel snapshots pulled from GIT, and thus under heavy development.
                    • Some of the terminology in the article can be seen as inflamatory.
                    • BTRFS is also not production ready. Filesystems inparticular will often suffer steep regressions as necessary data protection mechanisms are implemented.
                    • Before Phoronix, I'm not aware of anybody doing real-world, comprehensive benchmarking against Linux that would quickly identify performance regressions such as this in a relatively consise manner.
                    • The author does not appear to have filed a bug or pinged the kernel mailing list to bring up the issue.
                    • A few of the Phoronix readers who happened on this article have researched the issue themselves and found that the kernel devs are already aware of the issue and are working to identify and correct it.

                    On one hand, I appreciate the fact that Michael took the time to write this article. The information is ultimately useful, and things like this are good to be informed about. On the other hand, the target audience eludes me. Was it intended for less technical people? My fear would be that such folks might read it and think, "Oh no, Linux performs like crap!" In fact, this is not really as pertinent as some of the hyperbolic language makes it out to be -- it is, as I mentioned, a pre-RC kernel under heavy development. If not for Linux kernel n00bs, is it for technical folks who can reproduce it and fix the issue? In the latter case, why not provide more information about the test stack (kernel configs, dmesg output, list of running processes, information on the base OS, etc)? More technical information would have been useful.

                    In summary, my personal thanks to Michael for a) the research and b) for the article. However, I also feel some of the criticism in this thread are valid. Maybe you should choose your words a bit more carefully in the future?


                    • More testing sounds nice in general, but this is Linux we're talking about; the sheer speed and scale of the process confounds extensive generalized testing being applied at the level of individual developers and individual changes. While you're waiting for your authoritative testsuite results, a dozen more patches will probably be submitted, and after you submit your patch, even more patches will be submitted. How do you know that those patches don't expose an otherwise concealed bug in your patch (or vice versa)? It seems like the best you can hope for is to demonstrate that your patch doesn't break anything yet. I don't see how the "regressions never get in" standard suggested by the article happens without a precipitous drop in the rate of patches. I also don't see how the current process is really failing to address this regression. Sure, it would be nice if it got fixed instantly, but fixes aren't always as simple as reverting a patch. If you apply patch X and get errors from subsystem Y when running application Z, how do you know what the actual problem is without further study and testing? Maybe reverting the patch just covers the mine for another developer to step on later.