Announcement

Collapse
No announcement yet.

RadeonSI Gallium3D Re-Enables SDMA For Sea Islands, Carrizo

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    The best way to remember all that K-thing is to not remember it just look at map of those rivers/area

    Comment


    • #12
      I wonder what the benefits of SDMA compared to the DMA engines in previous Radeon-GPUs are.
      E.g. even on a HD7750 I can to texture upload with almost 0% CPU overhead.

      Can SDMA and the 3D engine work simultaneously?

      Comment


      • #13
        Originally posted by Linuxhippy View Post
        I wonder what the benefits of SDMA compared to the DMA engines in previous Radeon-GPUs are.
        E.g. even on a HD7750 I can to texture upload with almost 0% CPU overhead.

        Can SDMA and the 3D engine work simultaneously?
        SDMA is similar to the DMA engines in previous asics. It's more of a redesign at the hw level than a major change in functionality. Both SDMA and the previous DMA engines are independent and can run asynchronously with other engines. The DMA engines are optimized for transfers between vram and system memory.

        Comment


        • #14
          Originally posted by atomsymbol
          Code:
          $ glretrace -b MadMax.trace |& sort | uniq -c | sort -h -r
          Rendered 1454 frames in 12.8516 secs, average of 113.138 fps
          
          3395 cik_sdma_copy_texture: copy_width=1, copy_height=1
          These small copies (1x1, 2x2 etc.) look bad. Shouldn't it be easier to put some write packets into the command buffer? At these sizes, tiling should be irrelevant anyway...

          Comment


          • #15
            Originally posted by brent View Post

            These small copies (1x1, 2x2 etc.) look bad. Shouldn't it be easier to put some write packets into the command buffer? At these sizes, tiling should be irrelevant anyway...
            Both the gfx and dma engines are programmed via command buffers. For transfers, the dma engine often has less overhead because all of the state is encompassed directly in the transfer packet while using the gfx engine requires setting up the entire 3D pipeline. That said, there are cases where it makes more sense to use the gfx engine due to the overhead of cross engine synchronization.

            Comment


            • #16
              SDMA brings:
              - faster TexImage and ReadPixels performance for CIK
              - much better GPU offloading (PRIME) performance if the dGPU is CIK.
              (VI can't use SDMA in most cases, because SDMA doesn't support delta-color compression)

              The next step is to port the CIK SDMA blit code to SI, so that SI will get the same benefits as CIK. I don't know if people are interested in that.

              Comment


              • #17
                Originally posted by marek View Post
                SDMA brings:
                - faster TexImage and ReadPixels performance for CIK
                - much better GPU offloading (PRIME) performance if the dGPU is CIK.
                (VI can't use SDMA in most cases, because SDMA doesn't support delta-color compression)

                The next step is to port the CIK SDMA blit code to SI, so that SI will get the same benefits as CIK. I don't know if people are interested in that.
                I thought DCC is for GPU<->VRAM communication, and DMAs are for VRAM<->system RAM?!

                BTW: Vulkan does have different queue types, graphics (handled by GCP/ME), compute (handled by ACEs/MECs) and transfer (handled by DMAs), right?! I don't think something similar does exist in GL, so how could older/OGL games even benefit? Are transfer tasks submitted to DMAs by the driver and then asynchronously processed?

                I'm not owning one, but I'm sure support for SI would be highly appreciated However, personally, I'd rather see some progress in opening OCL/Vk...

                Comment


                • #18
                  Originally posted by juno View Post
                  I thought DCC is for GPU<->VRAM communication, and DMAs are for VRAM<->system RAM?!..
                  Yep - GPU stores compressed content in VRAM, which can't be directly used by CPU. You move data between VRAM and system RAM for one of two reasons - the CPU needs to access it, or you ran out of VRAM and need to free up space.

                  If the CPU needs to access it, then the data in VRAM needs to be decompressed before it can be used... so being able to decompress while transferring is a Good Thing. I think Marek is saying that on VI a decompress-while-transfer needs to be done on GPU.

                  Originally posted by juno View Post
                  BTW: Vulkan does have different queue types, graphics (handled by GCP/ME), compute (handled by ACEs/MECs) and transfer (handled by DMAs), right?! I don't think something similar does exist in GL, so how could older/OGL games even benefit? Are transfer tasks submitted to DMAs by the driver and then asynchronously processed?
                  The OpenGL calls Marek mentioned (glReadPixels, glTexImage2D) typically involve implicit, synchronous transfers as part of processing the API calls. The glReadPixels call typically needs to pull VRAM data down to system memory, while glTexImage2D typically needs to push texture information up to VRAM.

                  I imagine the TexImage call could defer the transfer and execute it asynchronously as long as the next draw call confirmed that the transfer was complete before executing the draw, but in the case of ReadPixels I believe data needs to be CPU accessable as soon as the call returns.
                  Last edited by bridgman; 26 October 2016, 09:25 PM.
                  Test signature

                  Comment


                  • #19
                    Originally posted by marek View Post
                    SDMA brings:
                    - faster TexImage and ReadPixels performance for CIK
                    - much better GPU offloading (PRIME) performance if the dGPU is CIK.
                    Better say for CIK dGPUs, since i see nothing on Kabini/Kaveri Rivers... it looks to be here and there even slighlty worse

                    Comment


                    • #20
                      DCC is a color buffer compression scheme. It actually uses more memory but saves bandwidth. It is a similar idea to Z-buffer compression, CMASK for fast color clear, and FMASK for multi-sample compression.

                      Comment

                      Working...
                      X