Announcement

Collapse
No announcement yet.

When Open-Source Graphics Drivers Break

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • When Open-Source Graphics Drivers Break

    Phoronix: When Open-Source Graphics Drivers Break

    This morning I wrote about the troublesome experience of Intel Sandy Bridge graphics under Ubuntu 11.04 as the packages found in the Natty repository are outdated and contain only the initial "SNB" support. In the mainline upstream code, Sandy Bridge is supported much better, offers faster performance, and possesses other new features (e.g. VA-API encode), except in the past week the Intel SNB Linux code temporarily broke hard...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    While the automated stuff is really outstanding in its ability to find very elemental problems with getting a basic desktop booted and working with compiz, it doesn't do very much for spotting regressions in more sophisticated applications, such as video decode/encode, GLSL compiler, etc.

    For instance, I have been trying to track down a r600g regression where the kernel hard-locks while running the application Imprudence (or Second Life, or any other viewer derived from the Linden Lab code). The problem isn't evident in other advanced 3d applications, but it must be the particular interleaving of the calls, or the particular calls used in this application that causes the bug to be exposed.

    Unfortunately, that too is a regression -- the application worked extremely well back in January and early February. It's such an obvious regression because the mesa 7.10 stable branch results in correct functionality, but 7.11-dev exposes the problem. I've already reported it on the FDO bugzilla, attempted to bisect it, and provided the results of my attempt, but none of the developers have taken it any further in a week. The most embarrassing part of all of this is that the application in question is Free Software, so there's no reason why someone experienced in OpenGL wouldn't be able to write a piglit test case to make sure that this app will continue to work. (NB: I am not an OpenGL programmer (yet)).

    I haven't seen such widespread regressions cropping up since the 2007-2008 era, where the whole stack experienced massive growing pains from integrating gallium3d, KMS, DRI2 and GEM, all around the same time frame. 2009 and 2010 were, relatively speaking, uneventful in the way of regressions. But here come the regressions in 2011, and this time without there being an overhaul in the architecture, and it seems that developers are getting sloppy.

    The only solution I've found for my problem so far is to continue using Mesa 7.10.x, which has its own set of problems with other apps I use. So I have to basically choose between 7.10.x, which runs Imprudence just fine, or 7.11-dev, which breaks Imprudence but runs other applications faster / with fewer bugs (example: Unigine engine titles).

    My points?

    1. I've seen the graphics developer community put out much more quality releases with fewer bugs in past years, and suddenly things are starting to fall apart. Why? Can this be improved upon, to get the regression rate back to where it used to be? 2010 was a high point for Mesa and the graphics stack, I think, but we're back into unstable waters again in 2011.

    2. There are some applications which, while quite popular and open source, simply don't get tested. This may be due in part to the fact that the overwhelming majority of these apps' user base take for granted that Mesa will not work with their app, and they instead install a proprietary driver. But we need to change that perception and increase the number of users who can run without a proprietary driver. To do that, we first have to stop causing regressions like this. To do that, the easiest thing would be to write a piglit test case that captures something unique and potentially breakable that the application does.

    Intel users: don't worry, you aren't the only ones feeling the burn of regressions.

    Comment


    • #3
      Originally posted by allquixotic View Post
      My points?

      1. I've seen the graphics developer community put out much more quality releases with fewer bugs in past years, and suddenly things are starting to fall apart. Why? Can this be improved upon, to get the regression rate back to where it used to be? 2010 was a high point for Mesa and the graphics stack, I think, but we're back into unstable waters again in 2011.

      2. There are some applications which, while quite popular and open source, simply don't get tested. This may be due in part to the fact that the overwhelming majority of these apps' user base take for granted that Mesa will not work with their app, and they instead install a proprietary driver. But we need to change that perception and increase the number of users who can run without a proprietary driver. To do that, we first have to stop causing regressions like this. To do that, the easiest thing would be to write a piglit test case that captures something unique and potentially breakable that the application does.
      Distilled down to a few pithy words, aren't you just saying that 7.11 unreleased work-in-process is not as stable as 7.10 released code ?

      Regressions *are* going to happen between releases. An increasing number of them are able to be caught by automated testing but for the forseeable future there are going to be problems which are only visible to end users. Those should be reported and bisected as you have done, but I'm not sure why you are expecting a fix in a week. The question is whether regressions are fixed by the release timeframe, isn't it ?

      I understand and agree that one bug can hide others so letting the code quality degrade seriously between releases is likely to cause pain at the end, but I don't think that's the scenario you are describing here.
      Test signature

      Comment


      • #4
        Originally posted by bridgman View Post
        Distilled down to a few pithy words, aren't you just saying that 7.11 unreleased work-in-process is not as stable as 7.10 released code ?
        In the case of the Second Life viewer and its variants, I think you almost have to use either an unreleased version of Mesa or version 7.8 (or binary blobs, or a different OS). Intel's new GLSL compiler introduced a regression that caused the program to crash if you enabled shaders, and it took until a month or two ago for someone to actually fix it.

        Originally posted by bridgman View Post
        Regressions *are* going to happen between releases. An increasing number of them are able to be caught by automated testing but for the forseeable future there are going to be problems which are only visible to end users. Those should be reported and bisected as you have done, but I'm not sure why you are expecting a fix in a week. The question is whether regressions are fixed by the release timeframe, isn't it ?
        I'm not holding my breath.

        Comment


        • #5
          Originally posted by bridgman View Post
          Distilled down to a few pithy words, aren't you just saying that 7.11 unreleased work-in-process is not as stable as 7.10 released code ?
          No, in fact I spend very little time actually using the tagged releases or stable release branches. I use them for testing purposes as a reference point, but I live on git master most of the time. So I have a vague sense of how much breakage I can expect in master; how long it should live there; and how long it should take to see work started to fix it.

          I'm not saying master should always be 100% stable, just that I have an idea in my mind of what the "typical" git master acts like in terms of stability (at least in 2009-2010), and we're a significant step departed from that now.

          Originally posted by bridgman View Post
          Regressions *are* going to happen between releases. An increasing number of them are able to be caught by automated testing but for the forseeable future there are going to be problems which are only visible to end users. Those should be reported and bisected as you have done, but I'm not sure why you are expecting a fix in a week. The question is whether regressions are fixed by the release timeframe, isn't it ?
          The ones only visible to end users are the ones that really drag down the quality of the release, because they affect real-world apps, and most of the time they don't get reported -- or if they do, there's not enough info for the development of a trivial fix, so the bug sits there for much longer than it should. Having a release that passes all your automated tests but doesn't run any real-world applications is completely useless. Of course this is a hyperbole and it's never that case that a Mesa release doesn't run *any* real-world apps, but it is better if it runs most / all of them instead of just a select few that the developers test (OpenArena and Compiz, I'm looking at you).

          I guess the case of 7.10.x is a fairly good example of a release that *does* in fact run real-world applications (at least most of the ones I've tested), but I feel like the onus is on the user community to ensure that each successive release will continue to uphold that quality. And right now I'm not seeing it for 7.11, and the same problems have persisted for several months now.

          Originally posted by bridgman View Post
          I understand and agree that one bug can hide others so letting the code quality degrade seriously between releases is likely to cause pain at the end, but I don't think that's the scenario you are describing here.
          I do see two interacting bugs, with some overlap while trying different historical git revisions: one where there's a very huge memory leak, and one where you get a kernel hard-lock. Since they overlap and interfere with one another, it's really hard to debug exactly what problem occurs on which revision. I haven't isolated it definitively, I only have some indicators (I've included that info in the bug report).

          Also, today I found that Minecraft soft-crashes on 7.11-dev, which it doesn't on 7.10.x. So I guess I need to bisect that, too. Can you see where I'm going with this? I'm an engineer and open-source advocate, but I really don't have the resources to invest all this time and energy into making sure that the apps I use will work with successive versions of Mesa. From the perspective of Mesa, I am really just a regular user with some degree of extra technical knowledge. I don't mind contributing bug reports to help out, but I really feel quite alone if I'm the one being the first to report issues in major open source applications. I have to ask myself the question, "if I don't report this bug, is the quality of the next Mesa release going to be negatively impacted by this regression remaining in the software?" And the answer is "Most probably, yes" -- because I feel that I have no other users (much less the developers) doing the same kind of testing that I'm doing, on common open source and commercial applications.

          And it's not any kind of special testing, either. Just download the app and run it. Does it render correctly without crashing? (Yes/no). Very simple. But if it's up to me to do this procedure for about a dozen apps that I seem to be the only user of in conjunction with the Mesa graphics stack, that's a very large burden to place on me, considering that developing Mesa is not the focus of my open source contribution, and I'd prefer to invest my time on projects where I (1) have significant domain knowledge, and (2) have an assigned maintainership responsibility, so I know that users are counting on me. I have neither of these things for Mesa.

          Comment

          Working...
          X