Radeon Gallium3D Picks Up A Nice Performance Optimization For iGPU/dGPU PRIME Setups

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Quackdoc
    Senior Member
    • Oct 2020
    • 5112

    #31
    Originally posted by PCJohn View Post

    Please, enlighten me on this. I thought until today that mux setups are present just on the most expensive solutions, like Dell Precision laptops. And you can switch between outputs only in BIOS at boot time, not in your OS. It can be used to get Nvidia outputs out (instead of Intel) if you, for instance, have stereoscopic glasses and need to have controlled them by Nvidia directly.
    Originally posted by darkbasic View Post

    I was not aware of this limitation... Nowadays muxes are pretty common in high end gaming laptops and I thought it was possible to switch outputs on the fly in mux setups.
    What's the point of the iGPU if you can't switch outputs on the fly? Are you sure of this?

    you can swap muxes in OS on some laptops, Nvidia calls it Advanced optimus. but traditional muxes need a reboot. the majority of laptops work by using prime (and whatever the equivalent is on windows). older gaming laptops mux used to be really common. but that was replaced by optimus because most people didn't like rebooting, so they left it on dgpu, which kills battery, but now because of the performance hit, NVIDIA is now migrating to "Advanced Optimus" which is as I said, mux without reboot.


    Comment

    • agd5f
      AMD Graphics Driver Developer
      • Dec 2007
      • 3939

      #32
      For MUXes to work smoothly at runtime, you need to teach the compositor (or compositors on Linux since there are a lot of them) to properly handle them. Then the compositor can switch the MUX to the rendering GPU when starting a full screen application and then switch it back when the application finishes.

      On Linux today you can already change the MUX at runtime, but since compositors don't know how to handle them, you have to restart your compositor.

      Comment

      • PCJohn
        Phoronix Member
        • Nov 2016
        • 64

        #33
        Thanks guys! Very helpful.
        Interesting that my Quadro RTX 3000 (GF 2070 equivalent) does not have this Advanced Optimus, but needs to switch it in BIOS.

        Comment

        • darkbasic
          Senior Member
          • Nov 2009
          • 3088

          #34
          Originally posted by agd5f View Post
          For MUXes to work smoothly at runtime, you need to teach the compositor (or compositors on Linux since there are a lot of them) to properly handle them. Then the compositor can switch the MUX to the rendering GPU when starting a full screen application and then switch it back when the application finishes.

          On Linux today you can already change the MUX at runtime, but since compositors don't know how to handle them, you have to restart your compositor.
          Can you confirm what Linus said about modern high end muxless setups (a 10% performance penalty) or do you think it can be done better?
          ## VGA ##
          AMD: X1950XTX, HD3870, HD5870
          Intel: GMA45, HD3000 (Core i5 2500K)

          Comment

          • PCJohn
            Phoronix Member
            • Nov 2016
            • 64

            #35
            Originally posted by darkbasic View Post

            Can you confirm what Linus said about modern high end muxless setups (a 10% performance penalty) or do you think it can be done better?
            You can think about it as something eating your bandwidth (and increasing lag), but it might not directly eat your computing resources (if it is using async data transfer hardware). Typical FullHD screen takes 1920*1080*60*4 = ~500MiB/s. If the card is connected through PCI Express 3.0 x16, the interface has throughput about 15 GiB/s, so we are about 3% of PCI Express bandwidth. So, not too much to slow down your computer. But it is 3% anyway. And the time to transfer one frame is about 0.5ms. Not too big lag but it is 0.5ms... I am not hw expert, just always thought about it like this.

            Comment

            • agd5f
              AMD Graphics Driver Developer
              • Dec 2007
              • 3939

              #36
              Originally posted by darkbasic View Post

              Can you confirm what Linus said about modern high end muxless setups (a 10% performance penalty) or do you think it can be done better?
              I'm not familiar with the particular laptop in the video (it was an intel + nvidia system IIRC) and how optimized it is, but as PCJohn noted, it adds at least one extra copy; possibly more depending on how optimized the particular system is. Who does the copy (iGPU vs dGPU), whether the copy is asynchronous with other rendering or not, the size and depth of the surface being copied, the bandwidth of the system memory and PCIe link, and and what else is going on in the system all play a part.

              Comment

              • darkbasic
                Senior Member
                • Nov 2009
                • 3088

                #37
                Originally posted by PCJohn View Post
                Typical FullHD screen takes 1920*1080*60*4 = ~500MiB/s
                "Typical" is becoming something like 4K@144Hz or 1080p@240Hz in high end gaming laptops. I can't believe it's just a bandwidth issue because that would be easily fixed by switching to PCI Express 4.0 16x.
                ## VGA ##
                AMD: X1950XTX, HD3870, HD5870
                Intel: GMA45, HD3000 (Core i5 2500K)

                Comment

                • agd5f
                  AMD Graphics Driver Developer
                  • Dec 2007
                  • 3939

                  #38
                  Originally posted by darkbasic View Post
                  "Typical" is becoming something like 4K@144Hz or 1080p@240Hz in high end gaming laptops. I can't believe it's just a bandwidth issue because that would be easily fixed by switching to PCI Express 4.0 16x.
                  That is really all there is to it. Initially there were often two copies involved because you had to get the rendered from from vram on one GPU to vram (or carve out) on another via system memory. Peer to Peer DMA on PCIe is really only starting to take off now and even now there isn't a good generic way to determine what platforms support it. All Zen CPUs support it, Intel is a bit more quirky depending on which root ports the devices are attached to. Non-x86 is completely undefined at the moment as far as I know.

                  Comment

                  • Ph42oN
                    Junior Member
                    • Sep 2020
                    • 8

                    #39
                    Does this require anything else to work than just installing mesa 21.3? I have currently 21.3.1 installed and my laptop 2500u+RX 560X still seems to have bigger performance hit for using linux than my desktop RX 480.

                    Comment

                    Working...
                    X