Announcement

Collapse
No announcement yet.

Intel Workaround For Graphics Driver Regression: "The Platform Problem Going Crazy"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Intel has a history of sabotaging their own graphics drivers. I have a thread about lockups on 5.4+ that was only fixed "silently" in 5.7 for Liquorix: https://techpatterns.com/forums/about2775.html

    Looks like whatever the Intel team is doing, they're unable to reproduce issues reliably that customers get on their laptops/desktops. Until they fix that, expect more of this BS.

    I'm happy that Dave Airlie put his foot down for this blatant inability to focus on the root cause. Maybe this will change Intel's behavior going forward if they know what to expect.

    Comment


    • #12
      Originally posted by intelfx View Post
      I don’t understand. Each of these patches is an improvement in itself, regardless of the regression, right? Each of these patches has a commit message that makes sense. And what happens is the DRM maintainer basically says “no, the kernel will not get these performance improvements, never ever”.

      WTF?
      It's why I asked for benchmark or power consumption results, refactoring the driver with no real world benefit and increasing complexity isnt beneficial to the long term health of the driver


      Comment


      • #13
        Originally posted by intelfx View Post
        I don’t understand. Each of these patches is an improvement in itself, regardless of the regression, right? Each of these patches has a commit message that makes sense. And what happens is the DRM maintainer basically says “no, the kernel will not get these performance improvements, never ever”.

        WTF?
        That's not really how it goes, at least from my understanding.

        Intel has a regression somewhere. Instead of finding it, they push optimizations to overcome the regression, so nobody notices.

        The thing is there is some refactoring being done which might make understanding, tracking, finding and fixing the regression harder.

        What should be done instead is:
        1. Investigate the regression
        2. Understand it
        3. Fix it
        4. Only THEN you propose new optimizations on top of the fix.

        If somebody has a better grasp of the situation, feel free to correct me. But this is what I understood.

        Comment


        • #14
          Originally posted by damentz View Post
          Looks like whatever the Intel team is doing, they're unable to reproduce issues reliably that customers get on their laptops/desktops. Until they fix that, expect more of this BS. I'm happy that Dave Airlie put his foot down for this blatant inability to focus on the root cause. Maybe this will change Intel's behavior going forward if they know what to expect.
          I cannot comprehend the patches but wasn't it just meant as a fixup which also helps with the performance loss by being beneficial overall? So it isn't related at all to the regression technically? Finding and fixing the regressions in the first place would be great for sure, but I don't see a connection between the two and seems to me as a separate effort. Maybe it was just not a great idea of Chris to bring that up at all as it looks like concealing these unfixed regressions. Now he got a NAK from Dave which could have been avoided by better phrasing of his intentions and motivations.

          Comment


          • #15
            Originally posted by intelfx View Post
            I don’t understand. Each of these patches is an improvement in itself, regardless of the regression, right? Each of these patches has a commit message that makes sense. And what happens is the DRM maintainer basically says “no, the kernel will not get these performance improvements, never ever”.

            WTF?
            Not necessarily. Intel generally isn't one of the culprits, but there are a number of contributors that have discovered that the number of commits to the kernel (and other projects) gives them a sort of notoriety. Then follows a number of low quality and often untested commits that break functionality or regress performance in certain areas. One recent example is/was a long standing commit that completely broke the netatalk (AppleTalk) module used by vintage Mac enthusiasts made by an alleged commit stats gamer. The discussion on commit stats is in comments.

            The Linux kernel code base has grown so much that quite a bit of it goes largely untested or inadequately tested before it's pushed out to the wide world. Linus himself can't do more than make sure everything compiles. He just doesn't have the time or hardware to thoroughly audit and test every single line of committed code. Some of that work is farmed out to responsible parties, like the graphics system informal commit team. At some point, you have to take the person's word for it that the code is necessary and doesn't break or slow things down. Repeated screw ups will eventually come out, but by then there's already damage done. Automated testing and analysis can only go so far. In the end, gaming the system (for Internet points) and a lack of testing time and capability is causing quality problems. This doesn't even touch the problem with a lack of skilled and experienced eyes needed to check every piece of open source code that goes into the typical Linux distro. Like the recent GRUB2 debacle illustrates - many bugs and vulnerabilities were discovered once skilled people took a look at the code, not just the BootHole issue.

            Comment


            • #16
              Intel has a regression somewhere. Instead of finding it, they push optimizations to overcome the regression, so nobody notices.
              Likely some manager that enforces agreed procedures and this is easier than arguing.

              Is it really Intel only? Or did they notice it just for their workloads?

              The thing is there is some refactoring being done which might make understanding, tracking, finding and fixing the regression harder.
              Does it make it harder? I mean the regression happened in the past and can be found there.

              What should be done instead is:
              1. Investigate the regression
              2. Understand it
              3. Fix it
              4. Only THEN you propose new optimizations on top of the fix.
              Yeah well, what do you think would have happened, if the developer would have stated other reasons?

              I don't know the quality of the patches, but that should be the major criterion and if the trade-off between readability/maintainability and performance is acceptable

              Comment


              • #17
                Michael should the following say 5.8 rather than 5.7 ?

                To that patch series was then DRM subsystem maintainer David Airlie of Red Hat asking what introduced the regressions in Linux 5.7 and whether they are documented. As well, whether the regression is noticeable just to benchmarks or applications, etc.
                Last edited by bridgman; 05 August 2020, 03:09 PM.
                Test signature

                Comment


                • #18
                  Originally posted by airlied View Post

                  It's why I asked for benchmark or power consumption results, refactoring the driver with no real world benefit and increasing complexity isnt beneficial to the long term health of the driver

                  I got a feeling that Chris kind of wanted the patches to be rejected, but couldn't say so for obvious reasons.

                  Comment


                  • #19
                    Michael

                    Typo:

                    This this increased complication of the driver to offset the regression is now under the microscope.

                    Comment


                    • #20
                      Originally posted by damentz View Post
                      Intel has a history of sabotaging their own graphics drivers. I have a thread about lockups on 5.4+ that was only fixed "silently" in 5.7 for Liquorix: https://techpatterns.com/forums/about2775.html

                      Looks like whatever the Intel team is doing, they're unable to reproduce issues reliably that customers get on their laptops/desktops. Until they fix that, expect more of this BS.

                      I'm happy that Dave Airlie put his foot down for this blatant inability to focus on the root cause. Maybe this will change Intel's behavior going forward if they know what to expect.
                      The Kernel maintainers putting their foot down and being rude like this could make companies like Intel less likely to support Linux and other open source OSes in the future. Desktop Linux is at best maybe 5% of their users. At some point companies like Intel and Nvidia will just decide that Linux is not worth their time to support.

                      Comment

                      Working...
                      X