Announcement

Collapse
No announcement yet.

The Huge Disaster Within The Linux 2.6.35 Kernel

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by rekiah View Post
    Question: did YOU? "Other people," eh?
    We don't have:

    the dmesg output
    the .config
    output of lspci
    access to the hardware to test possible fixes

    Comment


    • #62
      Originally posted by rekiah View Post
      Question: did YOU? "Other people," eh?
      I have reported enough stuff in the past. And spent much time testing fixes. Did you?

      Michael has the hardware and the setup. It is his job to report it and his duty to test the patches.

      That is the deal.

      Comment


      • #63
        Originally posted by FireBurn View Post
        We don't have:

        the dmesg output
        the .config
        output of lspci
        access to the hardware to test possible fixes
        You do have hardware to test it out on your own. Hell they even have these neat little things called VM's now days....

        Comment


        • #64
          I already thought it went way too much into drama side when the article mentioned benchmarks on Btrfs.

          Comment


          • #65
            Originally posted by Michael View Post
            That though wouldn't go towards addressing the fundamental problem that this article is about: how such a glaringly severe regression can be pulled into the tree in the first place and then live there for days. Improving the status quo is what this article is intended to be about more than this bug per se.
            well if you'd done just a little bit more research you'd found the lkml thread mentioned in this forum thread before and you'd seen that it might not even be a kernel bug but an udev bug...

            the more spectacular news an article brings, the more effect you'll have to put in to make sure that you've got your facts straight.

            you're kind of falling flat-out on your face right now. and you're pissing of lots of kernel people. not very wise

            Comment


            • #66
              TBH I don't see what all the fuss is about. If you don't like phoronix, you don't have to read any of the articles on it. If you're complaining, because you want the articles to improve, maybe you should volunteer to write the articles, or be an editor.

              It's no more Michael's responsibility to report regressions as it is any of ours. This is opensource, people do what they want. His job is essentially to do what he wants and post it on phoronix. That happens to include benchmarking the kernel and posting the results. Now it's great that he managed to get so many people's attention, it's certainly what he wanted, and anybody getting really pissed on here is just getting pwned by his mastery. Why don't YOU report the bug then? I'm not because I can't be asked, and if you're not, you better admit it right now, you're a lazy ass, because you could've done it by now.

              I'm glad that he at least reported it on Phoronix, it's better than not doing anything at all right? Clearly Michael wanted to highlight the deficiencies of how things are submitted and admitted into the kernel. For those who of who say "you should know that it's not RC yet, so bugs aren't to be fixed yet", well this article just proves that this is how kernel developers think the process should work. He leaves it up to US to decide if that's how kernel development should work. Now just because he's doing that, you shouldn't call him a n00b and say Phoronix has found a new low. At the end of the day, phoronix is just another news site trying to grab readers, to feed people like Michael.

              Whether Michael decides to interfere with the development or not, it shouldn't matter, phoronix is supposed to be Linux news site with a tendency to talk about performance of hardware, and the software that it runs. Stop being so self-righteous people (well the people who are being self-righteous).

              Punk'd.

              Comment


              • #67
                LKML posting

                Originally posted by s4e8 View Post
                yeah, I reported that. It's quite bad indeed. I first thought it was something related to udev and kernel changes but after upgrading to a newer build of udev (in Fedora 14/rawhide) it still remained.

                It would be nice to know if these tests were done to see if udev or other processes were pegging the cpu.

                Shawn.

                Comment


                • #68
                  Originally posted by spstarr View Post
                  yeah, I reported that. It's quite bad indeed. I first thought it was something related to udev and kernel changes but after upgrading to a newer build of udev (in Fedora 14/rawhide) it still remained.

                  It would be nice to know if these tests were done to see if udev or other processes were pegging the cpu.

                  Shawn.
                  Replying to myself is fun. They will revert the change.

                  Comment


                  • #69
                    Originally posted by Xanikseo View Post
                    TBH I don't see what all the fuss is about. If you don't like phoronix, you don't have to read any of the articles on it. If you're complaining, because you want the articles to improve, maybe you should volunteer to write the articles, or be an editor.

                    It's no more Michael's responsibility to report regressions as it is any of ours. This is opensource, people do what they want. His job is essentially to do what he wants and post it on phoronix. That happens to include benchmarking the kernel and posting the results. Now it's great that he managed to get so many people's attention, it's certainly what he wanted, and anybody getting really pissed on here is just getting pwned by his mastery. Why don't YOU report the bug then? I'm not because I can't be asked, and if you're not, you better admit it right now, you're a lazy ass, because you could've done it by now.

                    I'm glad that he at least reported it on Phoronix, it's better than not doing anything at all right? Clearly Michael wanted to highlight the deficiencies of how things are submitted and admitted into the kernel. For those who of who say "you should know that it's not RC yet, so bugs aren't to be fixed yet", well this article just proves that this is how kernel developers think the process should work. He leaves it up to US to decide if that's how kernel development should work. Now just because he's doing that, you shouldn't call him a n00b and say Phoronix has found a new low. At the end of the day, phoronix is just another news site trying to grab readers, to feed people like Michael.

                    Whether Michael decides to interfere with the development or not, it shouldn't matter, phoronix is supposed to be Linux news site with a tendency to talk about performance of hardware, and the software that it runs. Stop being so self-righteous people (well the people who are being self-righteous).

                    Punk'd.
                    Well said. For all of those that are saying he should file a bug report or post results on lkml, why couldn't the devs simply bookmark the results page on Phoromatic? Really if they can check a mail list they are just as capable of clicking on a web link to see the daily results.

                    Comment


                    • #70
                      Come on guys, let's settle down. Michael may have exagerated a bit, but now all of you are doing 10 times worse. All of you have your own reasons to say what you said, and Michael has his reasons to write the article as he did.

                      Let's move forward and just hope the kernel gets in a better stage.

                      Comment


                      • #71
                        Michael, the best place to publish your results is LKML, not your website.

                        Comment


                        • #72
                          It's funny that phoronix.com benchmark a system w/o notice a runaway process. It's known the commit
                          a7cf414 anon_inode: set S_IFREG on the anon_inode
                          break inotify, cause udevd dead loopping.

                          Comment


                          • #73
                            Originally posted by zoomblab View Post
                            Thing is, bugs like this shouldn't be allowed to enter into the main repository. Again, proper practices and test procedures...
                            Bullshit. It's just the first rc and Linux rc are like alpha versions in different projects. Maybe nobody allowed this bug to enter to the main repository, but it wasn't know?

                            Originally posted by bulletxt View Post
                            You guys seem more afraid of Michael words rather than a real possible Linux kernel regression. I know it's not even an RC, but how many of you would put 100$ on a table saying that the regression will be fixed(if it actually can) by the final realease??
                            I won't.
                            If this is not "intended" regression like in Ext4 case I'll put even 1000$. If it's intended it's something we probably want.

                            Originally posted by rekiah View Post
                            +1

                            "We're getting bloated, yes it's a problem," Torvalds said, "I'd love to say we have a plan. I mean, sometimes it's a bit sad and we're definitely not the streamlined hyper-efficient kernel that I had envisioned 15 years ago. The kernel is huge and bloated, and our icache footprint is scary. I mean, there is no question about that. And whenever we add a new feature, it only gets worse."

                            You're a moron. You don't know what caused this regression and it's also the first rc... The kernel is much faster in many areas then before and it's very easy to cut unneeded parts and you run only few percent of the entire kernel and the core kernel is very slim compared to others and running Linux kernel is much smaller then Windows ones.

                            Btw. what this article is for? To show there are regressions in the first rc? How this is different then before? It's something strange or unnatural Phoronix has to scream about? Why didn't you scream about slow OS X performance in graphics (in the final version, not the first rc!)? PTS is something great, but it depends how it is used and how the results are interpreted. In this case, if the change offers data to be more safe etc. then what's wrong?

                            THE HUGE DISASTER WITH THE PHORONIX TITLES :P

                            Comment


                            • #74
                              In the open source, distributed software development process, there is necessarily a tradeoff between code exposure (to the general public) and the quality of that code. There are two mindsets about this:

                              1) Once you (the developer) think your code is correct, give it a little spin on your system. If it seems to work, let others try it. This method can be summarized by "release early, release often", and dates back to the birth of the Free Software movement.

                              The advantage of this method is that a large pool of people have a chance to test your work early, providing feedback from environments you can't possibly replicate: different hardware, different time zones, different peripherals, and so on and so forth. Since the chance that at least one of these environments will have a problem with your code is rather high, it's best to have these issues exposed as early as possible.

                              The disadvantage of this method is that people who test your code are actually going to hit these bugs in their environment. As a developer with finite time and usually a relatively small number of environments in which to test your software, you are not particularly blameworthy for the fact that others encountered these issues. Your tacit agreement with the community -- at least from your perspective -- is that the community will help you test your code, because you don't have the resources to do it yourself. You release your code expecting that your users are expecting bugs. (And replace "bugs" with any type of defect; performance regressions are a subset of defects.)

                              2) The second philosophy is one that gained popularity in the proprietary world. The goal here is to ensure that no one except the original developer ever sees defects in the code. The developer may be a single person, or it may be an organization. Usually, the original developer has to be an entire corporation in order for this philosophy to be followed to its full extent: the number of possible system configurations, and the number of testing man hours, in a large corporation, is sufficient to properly exercise the software and find the issues.

                              The situation here, though, is different if you try to execute methodology 2) without the backing of a large corporation or your own personal QA team. The developer himself, if he is an organization of one person, is forced to try and verify his software as many ways as possible before exposing it to the public. If his goal is not to release it for wide testing, but to release stable, polished software, then he must do the polishing all by himself.

                              The advantage of this method is that users and testers of the software encounter a minimum of frustrating bugs. An example of software that follows this development methodology include Windows. When you try a Windows beta release, it may not be perfect, but the worst of the issues have already been shaken out by developers doing the work for you. By releasing it to you, they mostly want your qualitative feedback; they are not actively trying to involve you in the debugging process (or, at least, they hope they won't have to).

                              The disadvantage of methodology 2) is that the amount of time required to perform the verification and testing increases exponentially as the size of the development team decreases. Since many patches in a Free Software project come from many individual development teams of 1 person, that means the one person has to buy a Mac Mini, a Dell Vostro, a Compaq Presario, an Asus eeePC, a Lenovo ThinkPad, and on and on (to cover all possible hardware configurations), then fly around the world on a plane with each of their hardware configurations and test the software with both AC and DC outlets and in different time zones and humidity conditions. Now obviously I am exaggerating a bit here, but you can see the impracticality of this proposition.

                              What it seems Michael is proposing is that we add "yet another layer" to the sequence of tests that every developer -- individual or otherwise -- must go through in order to have their code released to be tested publicly. This is a departure from methodology 1), and an advance toward methodology 2).

                              Let's take stock of the current checks that a Linux kernel engineer must make before their patch is committed to Linus' tree:

                              1. The checkpatch.pl script, which performs automated analysis on the patch to spot glaring errors in submission.
                              2. The engineer must submit their patch to their lieutenant, who reviews the patch and has to sign off on it.
                              3. The kernel has very robust tracing and debugging facilities, as well as run-time checks, that can optionally be enabled. These facilities will complain loudly if they detect a problem. They impose a performance penalty, so they are turned off in release kernels; but usually, kernel hackers enable these when testing their patches.
                              4. Many kernel engineers are employed at an organization that has a license to Coverity Prevent, or some other static analyzer. These engineers might run their patches through Coverity before committing, to see if they've made any logic errors that can be automatically detected by pattern matching.

                              Michael now thinks we need to add a fifth step -- before you can commit your patch, you need to make sure it doesn't cause any performance regressions.

                              Unfortunately, this step opens up a massive can of worms that can never be exhaustively verified. Performance is very often not a binary (0 or 1) decision: a particular patch might cause a performance regression under some conditions, but not under others. It might even increase performance in some scenarios, and dramatically decrease performance in others.

                              The problem with performance is that it has to be evaluated empirically, with actual running hardware, in a real world scenario. The amount of work required to theoretically evaluate performance in a project as complex as the Linux kernel, is simply intractable. Anyone who took an algorithms course at university knows how to determine the worst case runtime of a for loop; but when you start involving all the exotic diversity of the modern hardware stack, of which the kernel primarily deals in, performance is almost always measured by observation, not by mathematics.

                              The sheer quantity of computing resources required to check every patch against a wide variety of hardware rivals the infrastructure of Google. Except, rather than having a bunch of nearly-identical servers, the point of this hypothetical "testing grid" would be to allow users to upload a particular patch, and have the entire grid perform performance tests that exercise that code, under as many unique environments as possible.

                              While you might say "YEAH! Let's do it!" you have to remember that hardware, unlike software, costs real-world dollars and cents. Each system costs money to produce, money (in manpower) to configure, and money to pay for its energy use and climate control. This kind of project is mostly incompatible with the ways and means of a Free Software community, because our community simply does not have the resources to commit to such a project.

                              Unless you're Intel or IBM or Red Hat, I guess. So maybe Michael's secret pipe dream is to convince the big players in the Linux scene to give him millions of dollars to build a testing grid for the Linux kernel (or maybe Free Software in general) in some California datacenter. If he pulls it off, it might genuinely help improve the quality of the commits.

                              But that's still working under methodology 2), which, when you really think about it, completely ignores the most efficient testing scheme possible -- a testing scheme that already exists and is deployed in the wild: that of the community, using their own commodity desktops and laptops and servers, to test the software.

                              In other words, why pay all that money to build a professional datacenter for testing, when the community already has the hardware we need (for all intents and purposes) to perform the testing ourselves? What could possibly be more diverse than a random sampling of people from around the globe with varying interests, philosophies and fields of study? We already own computers to get our work done; and it only takes a little while to test the code people put out.

                              Except that the granularity of our testing is necessarily coarser than the dream "testing grid". Our computers are not dedicated testing machines; we use them for real work from day to day. So when we have spare time -- every once in a while -- we test points in the development cycle where the developers think they've eliminated the most obvious bugs. They then set the code loose in the wild, for everyone to test on our decentralized, diversified testing grid.

                              I guess my point is two-fold: first, stop thinking like Microsoft , and two, who cares if bugs/regressions/etc makes it into the mainline branch? The Linux QA department hasn't even had a chance to look at that code yet! The QA folks -- the community with our decentralized testing grid -- hasn't even had a chance to test the code until RC1, and you're already slamming us for doing a piss poor job.

                              It's like if you were a carpenter and you were about to hammer a nail into a piece of wood, and you lifted your arm halfway up in the air to get ready to bring the hammer down, and I said, "AH! AH! AH! the angle on your arm is off by 15 degrees, you're going to drive the nail in crooked!" -- wouldn't that annoy you? You might say, "But I haven't even hit the nail yet!" -- that's what the Linux testers and developers would say to your over-dramatized article, Michael.

                              Comment


                              • #75
                                I agree with this article on one count: This isn't "RC"! A release candidate means that the product has already been beta tested and there will be no more changes except in the extremely unlikely event a bug is found. RC1 is NOT a release candidate! There is absolutely no chance that RC1 will go gold with little to no tweaking. Hell, even additional feature patches are accepted post-RC1 as with DRM. The testing model for the kernel is mislabeled at minimum.

                                Comment

                                Working...
                                X