Announcement

Collapse
No announcement yet.

RadeonSI Lands Regression Fix For ~10x Higher CPU Usage For Some Games

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RadeonSI Lands Regression Fix For ~10x Higher CPU Usage For Some Games

    Phoronix: RadeonSI Lands Regression Fix For ~10x Higher CPU Usage For Some Games

    Merged one month ago was RadeonSI enabling by default its optimization to replace uniforms with literals inside shaders. This uniform inlining helped with SPECViewPerf and other workloads but it turns out in the process sharply drove up CPU usage when running some games...

    https://www.phoronix.com/scan.php?pa...-Many-Variants

  • #2
    So probably the correct way to look at this issue is: specviewperf is written poorly, and overuses uniforms where literals could do.

    Comment


    • #3
      Recompiling and re-uploading whole shaders for performance reasons instead of just updating Uniforms sounds pretty counter-intuitive to me. Apart from badly written synthetic benchmarks I can't think of anything that would not suffer severely from the huge overhead of such an "optimization".

      Comment


      • #4
        Originally posted by microcode View Post
        So probably the correct way to look at this issue is: specviewperf is written poorly, and overuses uniforms where literals could do.
        Without looking at the code, I wouldn't go that far. It's more likely since it's been around a long time, despite being maintained, it's likely doing a stuff modern drivers no longer do or consider deprecated, and support that kind of thing on legacy code paths because they have to support old code base OpenGL programs indefinitely, like CAD packages. SpecViewPerf is supposed to be a synthetic benchmark designed to see how well those old programs will run.

        Comment


        • #5
          Edit - actually i may have misunderstood. Just going to delete this since i'm not sure.
          Last edited by smitty3268; 10 August 2021, 01:45 AM.

          Comment


          • #6
            Originally posted by soulsource View Post
            Recompiling and re-uploading whole shaders for performance reasons instead of just updating Uniforms sounds pretty counter-intuitive to me. Apart from badly written synthetic benchmarks I can't think of anything that would not suffer severely from the huge overhead of such an "optimization".
            That. This is literally the *point* of uniforms!

            I agree: the "optimization" is completely nonsensical. This is 100% wrong, and pure "benchmark cheating" with no real-world benefit at all. (With, as shown, actually a real-world negative instead - without counting the additional TB of disk space the cached shaders would use).

            This "throttling" hack is also utterly braindead. Piling hacks on top of hacks is how you get unmaintainable code. Stupid Stuff like this is what will cause some group of DK idiots to call out Mesa as being "old, badly designed, and etc" 2 or 3 years from now and splitting off to write "a new driver base without all the legacy cruft, because THIS time we'll do everything right"...

            Comment


            • #7
              Originally posted by arQon View Post

              That. This is literally the *point* of uniforms!

              I agree: the "optimization" is completely nonsensical. This is 100% wrong, and pure "benchmark cheating" with no real-world benefit at all. (With, as shown, actually a real-world negative instead - without counting the additional TB of disk space the cached shaders would use).

              This "throttling" hack is also utterly braindead. Piling hacks on top of hacks is how you get unmaintainable code. Stupid Stuff like this is what will cause some group of DK idiots to call out Mesa as being "old, badly designed, and etc" 2 or 3 years from now and splitting off to write "a new driver base without all the legacy cruft, because THIS time we'll do everything right"...
              Well, I just had discussion with coworkers about this, and it turns out that in case of uber-shaders this could in theory yield improvements. For instance, if the uniform enables/disables branches that can then be optimized away. However the decision which uniforms to inline must be smart - meaning only those that change very infrequently and ideally only have a fixed set of possible values.
              The smarter way to approach this would however be that the developers of the uber-shader let the application code generate the variants, instead of relying on driver hacks...

              Comment


              • #8
                Originally posted by soulsource View Post
                and it turns out that in case of uber-shaders this could in theory yield improvements.
                Sure - but that's "in theory", and also only as long as you don't care that you have to stall everything yet again to compile variant #2704 of the same shader. (Now sort-of somewhat mitigated by shader caches, but still far from ideal).

                The problem is that as soon as you get into "it must be smart enough to...", you are *absolutely guaranteeing* that that piece of code will be an unmaintainable mess as soon as the guy who wrote it leaves (or even sooner, if he doesn't touch it for a couple of years and then has to go back to it), and it will only get worse from there.

                > The smarter way to approach this would however be that the developers of the uber-shader let the application code generate the variants, instead of relying on driver hacks...

                Indeed. (Not to mention, having had to work on such "cosmic" shaders in the past, there are other reasons to avoid them whenever possible, though in fairness there are times you don't really have much choice).

                But this is *exactly* the kind of crap that goes on with Windows drivers already. Per-game hacks to win benchmarks with is what's taken drivers from 200K to 500MB in the past ten years - along with then having to produce release-day driver hacks to even run some games *at all*, because the last 7 layers of hacks in there happen to break when used the "wrong" way.

                This is a very slippery slope, and the purely benchmark-driven change that started this is the first step onto it. This "throttling" hack is the second step, and now Mesa no longer has a foot on stable ground. If the next change in this area isn't "revert all this garbage entirely", there's no doubt at all that it will end up being hacked on more and more with every release from now until doomsday as it inevitably chases bug after bug after "tuning change" as new games / benchmarks / etc wax and wane in popularity, until the stack of band-aids on it reaches all the way into space.

                Comment


                • #9
                  Originally posted by arQon View Post
                  But this is *exactly* the kind of crap that goes on with Windows drivers already. Per-game hacks to win benchmarks with is what's taken drivers from 200K to 500MB in the past ten years - along with then having to produce release-day driver hacks to even run some games *at all*, because the last 7 layers of hacks in there happen to break when used the "wrong" way.

                  This is a very slippery slope, and the purely benchmark-driven change that started this is the first step onto it. This "throttling" hack is the second step, and now Mesa no longer has a foot on stable ground. If the next change in this area isn't "revert all this garbage entirely", there's no doubt at all that it will end up being hacked on more and more with every release from now until doomsday as it inevitably chases bug after bug after "tuning change" as new games / benchmarks / etc wax and wane in popularity, until the stack of band-aids on it reaches all the way into space.
                  You need to sell hardware. Most people don't buy cards that have simple drivers with no driver optimizations. They buys cards that runs the apps they use fast.

                  Comment


                  • #10
                    Originally posted by agd5f View Post
                    You need to sell hardware. Most people don't buy cards that have simple drivers with no driver optimizations. They buys cards that runs the apps they use fast.
                    Yeah, I know.

                    Even with that in play though, I still think this is a terrible idea. The problem here isn't the "optimization" per se, it's that the optimization can, inevitably will, and in this case already has, be counter-productive in some contexts.

                    This is the sort of thing that, ironically, may be best handled by exactly the sort of per-exe hacks that the Windows drivers depend on - but my point is that the "base" code was hacked specifically to win a benchmark, then hacked again to cap at a random number of variants (leading to multiple rounds of gratuitous recompiles) because that first hack backfired. I have a hard time seeing that as how this should have been handled, at any point in that chain.
                    (IMO, of course. If you want to defend it, I'll defer to your judgement).

                    Comment

                    Working...
                    X