Announcement

Collapse
No announcement yet.

Valve Is Sponsoring More CI Testing For The Open-Source Radeon Linux Graphics Driver

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Valve Is Sponsoring More CI Testing For The Open-Source Radeon Linux Graphics Driver

    Phoronix: Valve Is Sponsoring More CI Testing For The Open-Source Radeon Linux Graphics Driver

    As good news not only to future Steam Deck users but all Linux gamers making use of the Mesa open-source graphics drivers, Valve is sponsoring additional continuous integration (CI) testing of Mesa commits...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Speaking of quality testing, I've noticed flickering which gets worse over time on my Vega 56 with Freesync on my 165 Hz display in recent weeks (https://gitlab.freedesktop.org/drm/amd/-/issues/1876). Unfortunately that is not something CI can catch, right?!

    Comment


    • #3
      Originally posted by ms178 View Post
      Speaking of quality testing, I've noticed flickering which gets worse over time on my Vega 56 with Freesync on my 165 Hz display in recent weeks (https://gitlab.freedesktop.org/drm/amd/-/issues/1876). Unfortunately that is not something CI can catch, right?!
      Kernel regression testing seems to be kind of a sore spot, especially when it comes to drivers. Companies break too much in their drivers after mainlining them (of course also when not mainlining them), and then you are stuck with the issue when sticking to that kernel version. Something needs to change here in a fundamental way, but I'm not optimistic it would be realistic to expect that to happen any time soon.

      Comment


      • #4
        Yeah, kernel driver regressions are not specific to AMDGPU. I heard kernel 5.15 was completely unusable for Intel Xe graphics users because it caused constant GPU hangs right after boot. Issues like these got me thinking were the drivers even tested at all. I mean how can a serious bug like this be missed? On the other hand, issues like these are much less common if the hardware isn't new.

        Comment


        • #5
          Originally posted by ms178 View Post
          Speaking of quality testing, I've noticed flickering which gets worse over time on my Vega 56 with Freesync on my 165 Hz display in recent weeks (https://gitlab.freedesktop.org/drm/amd/-/issues/1876). Unfortunately that is not something CI can catch, right?!
          Display testing is the hardest problem I have faced in my CI life. Completely solving it requires making dedicated hardware for it, and alternatives never cover 100% of the cases. Google made such hardware years and years ago, named Chamelium and we integrated it in IGT when I was working at Intel. Nothing fundamental has changed this though :s

          Comment


          • #6
            Here is a bit more context surrounding this series. So far, what you see is just Mesa CI, but we are moving on to testing other components of the graphics stack (DXVK, VKD3D, and maybe amdgpu in the future). We are also doing game testing using tracing tools, but the run time is prohibitive for Mesa CI unless we add hundreds of machines. So more work is definitely needed there.

            If you are interested in learning more about the work we are doing for Valve, you may follow these links:

            - Blog posts: My series on how to set up a bare-metal CI system and Charlie's iPXE article
            - Talks:
            - XDC2021 - Making bare-metal CI accessible: Where you can see some of the machines exposed by this merge request
            - LPC2021 - Bare-metal testing using containers: Where we explain how we create and deploy our test environment
            - FOSDEM 2022 (upcoming): Where we will explain our to get a fully-reproducible infrastructure by netbooting containers directly over the internet
            - Source code: Our infra, our initramfs that boots containers.

            As you can see, we mostly spoke about the infrastructure as this is what we have spent most of our time on until now. Now that it is settling down, unit testing more components and game-testing is ramping up and we will make sure to share our research which will hopefully catch most regressions before any user is affected

            Anyway, thanks for the shoutout of Charlie's MR, Michael!

            Comment


            • #7
              Originally posted by MuPuF View Post

              Display testing is the hardest problem I have faced in my CI life. Completely solving it requires making dedicated hardware for it, and alternatives never cover 100% of the cases. Google made such hardware years and years ago, named Chamelium and we integrated it in IGT when I was working at Intel. Nothing fundamental has changed this though :s
              Thanks for the insights, I had thought that it would be quite involved, as there are thousands of displays and GPU models out there, the test matrix must be huge. I hope Chamelium or alternative implementations are widely used in the industry to catch issues like this sooner. I am glad for any effort for the health of my eyes.

              Comment


              • #8
                Originally posted by MuPuF View Post
                Display testing is the hardest problem I have faced in my CI life. Completely solving it requires making dedicated hardware for it, and alternatives never cover 100% of the cases. Google made such hardware years and years ago, named Chamelium and we integrated it in IGT when I was working at Intel. Nothing fundamental has changed this though :s
                And the complete decanted hardware for certifying hdmi and display port output none of it exists as fully open hardware and its expensive as so it great that part solution like Chamelium exists..

                Originally posted by ms178 View Post
                Thanks for the insights, I had thought that it would be quite involved, as there are thousands of displays and GPU models out there, the test matrix must be huge. I hope Chamelium or alternative implementations are widely used in the industry to catch issues like this sooner. I am glad for any effort for the health of my eyes.
                Lot of people think of the thousands of displays as the problem but that is really a solved problem but the item you need to solved the problem is the problem. If you have unlimited budget you can drop 50k on a proper output certify hardware and 15K a year subscription. This is able to check timings and basically emulate every display ever made. Remember all displays have to be certified to use display port or hdmi logo. So thousands of individual displays can be replaced by a single box basically if you can afford the box. Of course with a 50k box you don't want a short or something ruining it. Some of these boxes do turn up in the second hand market without the update subscription any more for older standards.

                The reality here is a lot of companies making GPUs....(anything with display port or hdmi) don't have the space for thousands of monitors yet they still need to certify that the output of the GPUs.... are to specification to use the logo. Yes those making monitors have to use the box to certify that their monitor is to specification again to use the logo and they to specification as well and don't have room for thousands of machines to test every single GPU. So this area is highly regulated.

                Chamelium does not check as much as the proper certification boxes but its a lot cheaper. Some parties using IP KVMs as well this is not as broad testing but it is about cost. Yes usage of edid emulators and devices using relays to replicate connecting and disconnecting is found in some of those iP KVMs setups different group use like canonical/ubuntu.

                Of course that leaves all the different GPUs and other hardware issues.

                The monitor part of this problem what is need it a better output certification device that is open hardware this time and hopefully something under 1000 dollars piece. But as those making hardware will tell you this is a very hard problem needing a lot of investment.

                We are very much with this problem in the location were were with opengl... with the Khronos Group when the conformance test suites were for paid up members only expect this is with HDMI and Displayport... and you need special hardware. The display output testing would be a lot more simple of the testing hardware and conformance suite that those making monitors gpus use was more available and cost effective to have.

                Comment


                • #9
                  Originally posted by oiaohm View Post

                  And the complete decanted hardware for certifying hdmi and display port output none of it exists as fully open hardware and its expensive as so it great that part solution like Chamelium exists.
                  Indeed, and funnily-enough, the people making the hardware for DP certification are physically close to me, and we asked them if we could have open tests targeting their hardware but they were not thrilled by the idea. To them, what they deliver is a package (HW + SW) and they did not view our work with IGT with a good eye :s

                  In the end, we just improved the Chamelium support a lot. It allows checking most things, save DP-MST (limited testing possible), newer versions of DP/HDMI and HDCP.

                  I was considering making a new board myself, as a hobby project but sourcing existing DP/HDMI receivers is not the easiest due to HDCP. In the end, the best thing may be for us to use FPGAs as an HDMI/DP input, but as you say, this is years of work for a competent engineer!

                  Comment


                  • #10
                    Originally posted by ms178 View Post
                    Speaking of quality testing, I've noticed flickering which gets worse over time on my Vega 56 with Freesync on my 165 Hz display in recent weeks (https://gitlab.freedesktop.org/drm/amd/-/issues/1876). Unfortunately that is not something CI can catch, right?!
                    Could also be cablequality?

                    Have a gander at this level1tech video, outlines an often forgotten point of weakness:
                    Wendell is back at the white board, this time to talk about Displayports! Texas Instruments PDF: https://www.ti.com/tool/TIDA-01620#design-products**********...

                    Last edited by Entzilla; 24 January 2022, 03:49 AM.

                    Comment

                    Working...
                    X