Announcement

Collapse
No announcement yet.

Offchip Tessellation Lands In Mesa For RadeonSI Gallium3D

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by slacka View Post
    Does this or could it boost Nouveau too? Are the patches AMD specific?
    Sounded to me this is about taking advantage of hardware improvements introduced in RadeonSI GPU's

    Comment


    • #12
      BNieuwenhuizen I have some problems with the terminology here. What does "offchip" mean?
      Do you now use the actual hardware tesselation engines? How did tess work before, using geometry shaders running on the CUs (="onchip"?)?

      Originally posted by nanonyme View Post
      Sounded to me this is about taking advantage of hardware improvements introduced in RadeonSI GPU's
      GeForces actually do have more/stronger hardware tesselation units. But of course, that's extra work and not covered by this patch series.

      Comment


      • #13
        Originally posted by juno View Post
        BNieuwenhuizen I have some problems with the terminology here. What does "offchip" mean?
        It usually means that the operation is done by the GPU but with off-chip caches. off-chip caches are slower than on-chip caches but probably faster than GRAM, so you must keep this in mind when using them.

        Comment


        • #14
          SOM (with a 280x):

          Before the patch:
          Min: 14.21
          Average: 34.31
          Max: 75.89

          After:
          Min: 21.89
          Average: 39.70
          Max: 86.46

          The average doesn't change as much as I'd hope for, but the low doesn't go under 20 which is great, with the patched version I didn't notice a slowdown in the benchmark whereas I did before at the explosion.

          Comment


          • #15
            Originally posted by juno View Post
            BNieuwenhuizen I have some problems with the terminology here. What does "offchip" mean?
            Do you now use the actual hardware tesselation engines? How did tess work before, using geometry shaders running on the CUs (="onchip"?)?
            There are two parts in this. The first is that a subgroup can process 64 VS/TCS invocations at the same time. However it was previously set to process just one patch per workgroup which resulted in having ~3 invocations per subgroup in most games. Increasing this is a very significant performance boost for games.

            However, if you have very large tessellation factors (for reference, this starts to matter at e.g. tessmark >= 32x, while I don't think heaven uses tess factors >= 16x) this actually slowed stuff down.

            To understand why we have to look at TCS->TES io. All TCS outputs previously got passed to the TES in LDS memory. This is per CU memory, which means "onchip" storage. The result is that if we run a TCS subgroup on a certain CU we have to schedule all TES subgroups of the corresponding patches on the same CU. With large tess factors, we have many more TES subgroups than TCS subgroups and the CU might still have unrelated subgroup running, so that may take a long time to finish. This results in an imbalance between CU workloads. I'm not completely sure what all the bottlenecks here are. It might be just not enough TCS subgroups to keep all CU's busy or there might be some limitations in some queue lengths of the tessellation hardware.

            Anyway, if we pass the TCS->TES values through VRAM (in practice data will be in L2 though) we can now schedule the TES subgroups on different CU's which solves the imbalance. This most importantly solves tessmark getting slower due to the earlier change and at least with VI with some of the VI specific changes also improves performance a bit.

            Comment


            • #16
              Thank you for explaining that Bas, It's much clearer now!

              Comment

              Working...
              X