Announcement

Collapse
No announcement yet.

Mesa 19.0 Can Cut In Half The Amount Of Memory For Team Fortress 2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Half the RAM = double the hats.

    Comment


    • #12
      Would be nice to see what kind of performance impact this is going to have as, IIRC, the reason why they added shader caching in the fist place was because of games doing shader recompiles in the middle of gameplay, leading to some not-so-nice stuttering. Also, the initial startup time of Deus Ex: Mankind Divided before shader caching was kind of ridiculous and this is obviously a less-than-welcome change for those who weren't particularly impressed by it.

      If I'm right in my suspicions, then this is basically a textbook case of stepping over dollars to pick up dimes.

      Comment


      • #13
        This makes a ton of sense. I believe it cuts memory usage a ton and actually should not meaningfully increase compile times, either. To me, it makes the most sense to consider the changes in the reverse order.

        Here's a detailed explanation that should be mostly accurate:

        OpenGL applications call glCompileShader() to build shaders for each stage (vertex, fragment, ...), then later link them together via glLinkProgram(), giving us the complete picture. Typically, not a lot happens at CompileShader time. It's all the front-end parsing, and some basic optimizations to clean up and shrink the program. But, most of the real optimization work happens at link time, when we have the complete picture, and know exactly what's needed and what isn't. In Mesa, NIR optimizations don't happen until LinkProgram time. Drivers may also generate assembly at link time, which involves a bit of guesswork about the eventual GL state. If all else fails, drivers can generate shaders at draw time, based on the actual GL state.

        With shader disk caching...we cache the final end result, so no compilation needs to occur. This means the final assembly from the driver for each variant, as well as the fully-linked NIR. But we don't actually cache the GLSL IR for the individual pre-linked shader stages. We do, however, cache whether that shader compiled or not.

        Assuming everything is in the cache, the steps look something like this:
        1. glCompileShader(GL_VERTEX_SHADER) -> check cache based on SHA of the shader source, realize that we successfully compiled it in the past - don't actually bother compiling.
        2. glCompileShader(GL_FRAGMENT_SHADER) -> ditto
        3. glCreateProgram / glAttachShader - associate all the individual shaders into a program...
        4. glLinkProgram - check the cache based on the SHAs of all the included shaders, realize we've seen this combo before, load the final linked NIR from disk.
        5. glDraw* - load driver variants from disk
        So in the ideal case, we don't compile anything. (If the app attaches those shaders in a different never-before-seen combination, we might have to recompile the individual shaders from source - but this is so rare that it's not worth bothering with. We'd just waste your disk space for no benefit.)

        Tim's second change is a simple one. In the past, we would write the "this shader compiled" info for each individual shader...at link-time. But, we can instead write it at compile shader time (doing the same thing earlier). In the case he mentions, it sounds like the game calls CompileShader for all shaders it might use - but then only calls LinkProgram on shader-combinations the game actually wants, based on the current graphical settings. So, we would do all the work to compile them - and fail to record that we succeeded. This meant we'd compile them on every startup, instead of just the first one. Recording the info earlier lets us avoid this, and is the obvious right thing to do.

        So then, about the first change. In the past, we tried avoiding even basic optimizations at CompileShader time, to reduce the amount of time we spent there, at the expense of more memory usage (which can definitely matter for 32-bit apps). But now, with Tim's second change in place, we won't even bother to execute CompileShader at all if it's cached. So, this only matters at all on the first run - at which point, you probably want the actual optimization to happen, and are already paying the compile time cost. Plus, ideally Steam's cache pre-seeding will save you from paying that cost at all, making it really not matter.

        That's probably more gritty details than you wanted, but again - simple obvious change from Tim, with a fantastic impact. Solid work as usual.
        Free Software Developer .:. Mesa and Xorg
        Opinions expressed in these forum posts are my own.

        Comment


        • #14
          Originally posted by Kayden View Post
          That's probably more gritty details than you wanted
          No, that was great!

          Thank you very much for this, and for writing it in a way that's very clear to people not too aware of graphics.

          Comment


          • #15
            Originally posted by faph View Post
            Not impressed...
            Who cares about 800 MB of RAM usage??? (if it would be VRAM that would be another story...)

            Loading a savegame in BATTLETECH takes at least 30-40 seconds, so thanks to this update I can expect to wait even longer?
            If I'm not mistaken, and unless BATTLETECH does something really strange, the shaders should get cached, and therefore you won't see a difference (except maybe after updates of the game or the graphics drivers).

            Comment


            • #16
              Originally posted by debianxfce View Post

              I am with you, only running chromium can use 1GB of RAM. Sacrificing 20 seconds of loading time for 200MB is stupid. Deus EX is for sale and these patches is the the final countdown for me not buying that game. TF2 is a boring game but it is from Valve so better games has less value. They should make Portal 2 community test chamber download work faster.
              32bit games have memory limits, this helps address real world crashes. The extra 20 seconds would only be on the very first run of Deus EX after that things will be cached. It's also possible that my second patch eliminates any regression in compile time (I didn't check). Also if you are using steam you will likely not even see that extra 20 ever because steam will get the precompiled shaders for you.
              Last edited by tarceri; 20 January 2019, 11:07 PM.

              Comment


              • #17
                Originally posted by tarceri View Post

                32bit games have memory limits, this helps address real world crashes. The extra 20 seconds would only be on the very first run of Deus EX after that things will be cached. It's also possible that my second patch eliminates any regression in compile time (I didn't check). Also if you are using steam you will likely not even see that extra 20 ever because steam will get the precompiled shaders for you.
                Except due to how strict the versioning algorithm with cache is, we probably keep invalidating the cache basically every few weeks.

                Comment


                • #18
                  It would seem that the TF2 team are in on this. today's patch notes include:

                  Improved memory usage on OS X and Linux systems
                  • This should reduce the occurrence of "Out of memory or address space" errors on high texture quality settings
                  • In particular, drastically improved memory usage for Linux users using mesa prior to mesa 19.0
                  http://www.teamfortress.com/post.php?id=47741

                  Nice.

                  Comment


                  • #19
                    Originally posted by HenryM View Post
                    It would seem that the TF2 team are in on this. today's patch notes include:



                    Nice.
                    So now patch can be reverted to improve Deus Ex loading time?

                    Comment


                    • #20
                      tarceri from a quick test it seems like it also helps Cemu.
                      Before I had to restart BOTW a few times when recompiling all shaders as if not it would run out of memory, and I have 16Gb...
                      Now it seems I only have to restart once and that's enough.
                      I don't know if the problem is in Cemu, Mesa or Wine, but it'd be awesome to never have to restart it and for Cemu to use the same amount of memory with a cold or warm cache.

                      Comment

                      Working...
                      X