Announcement

**dungeon** · 26 October 2016, 10:36 AM

The best way to remember all that K-thing is to not remember it

just look at map of those rivers/area

**Linuxhippy** · 26 October 2016, 10:54 AM

I wonder what the benefits of SDMA compared to the DMA engines in previous Radeon-GPUs are.
E.g. even on a HD7750 I can to texture upload with almost 0% CPU overhead.

Can SDMA and the 3D engine work simultaneously?

**agd5f** · 26 October 2016, 11:09 AM

Originally posted by Linuxhippy View Post

I wonder what the benefits of SDMA compared to the DMA engines in previous Radeon-GPUs are.
E.g. even on a HD7750 I can to texture upload with almost 0% CPU overhead.

Can SDMA and the 3D engine work simultaneously?

SDMA is similar to the DMA engines in previous asics. It's more of a redesign at the hw level than a major change in functionality. Both SDMA and the previous DMA engines are independent and can run asynchronously with other engines. The DMA engines are optimized for transfers between vram and system memory.

**brent** · 26 October 2016, 11:23 AM

Originally posted by atomsymbol

Code:

$ glretrace -b MadMax.trace |& sort | uniq -c | sort -h -r
Rendered 1454 frames in 12.8516 secs, average of 113.138 fps

3395 cik_sdma_copy_texture: copy_width=1, copy_height=1

These small copies (1x1, 2x2 etc.) look bad. Shouldn't it be easier to put some write packets into the command buffer? At these sizes, tiling should be irrelevant anyway...

**agd5f** · 26 October 2016, 11:35 AM

Originally posted by brent View Post

These small copies (1x1, 2x2 etc.) look bad. Shouldn't it be easier to put some write packets into the command buffer? At these sizes, tiling should be irrelevant anyway...

Both the gfx and dma engines are programmed via command buffers. For transfers, the dma engine often has less overhead because all of the state is encompassed directly in the transfer packet while using the gfx engine requires setting up the entire 3D pipeline. That said, there are cases where it makes more sense to use the gfx engine due to the overhead of cross engine synchronization.

**marek** · 26 October 2016, 03:37 PM

SDMA brings:
- faster TexImage and ReadPixels performance for CIK
- much better GPU offloading (PRIME) performance if the dGPU is CIK.
(VI can't use SDMA in most cases, because SDMA doesn't support delta-color compression)

The next step is to port the CIK SDMA blit code to SI, so that SI will get the same benefits as CIK. I don't know if people are interested in that.

**juno** · 26 October 2016, 08:44 PM

Originally posted by marek View Post

SDMA brings:
- faster TexImage and ReadPixels performance for CIK
- much better GPU offloading (PRIME) performance if the dGPU is CIK.
(VI can't use SDMA in most cases, because SDMA doesn't support delta-color compression)

The next step is to port the CIK SDMA blit code to SI, so that SI will get the same benefits as CIK. I don't know if people are interested in that.

I thought DCC is for GPU<->VRAM communication, and DMAs are for VRAM<->system RAM?!

BTW: Vulkan does have different queue types, graphics (handled by GCP/ME), compute (handled by ACEs/MECs) and transfer (handled by DMAs), right?! I don't think something similar does exist in GL, so how could older/OGL games even benefit? Are transfer tasks submitted to DMAs by the driver and then asynchronously processed?

I'm not owning one, but I'm sure support for SI would be highly appreciated

However, personally, I'd rather see some progress in opening OCL/Vk...

**bridgman** · 26 October 2016, 09:16 PM

Originally posted by juno View Post

I thought DCC is for GPU<->VRAM communication, and DMAs are for VRAM<->system RAM?!..

Yep - GPU stores compressed content in VRAM, which can't be directly used by CPU. You move data between VRAM and system RAM for one of two reasons - the CPU needs to access it, or you ran out of VRAM and need to free up space.

If the CPU needs to access it, then the data in VRAM needs to be decompressed before it can be used... so being able to decompress while transferring is a Good Thing. I think Marek is saying that on VI a decompress-while-transfer needs to be done on GPU.

Originally posted by juno View Post

BTW: Vulkan does have different queue types, graphics (handled by GCP/ME), compute (handled by ACEs/MECs) and transfer (handled by DMAs), right?! I don't think something similar does exist in GL, so how could older/OGL games even benefit? Are transfer tasks submitted to DMAs by the driver and then asynchronously processed?

The OpenGL calls Marek mentioned (glReadPixels, glTexImage2D) typically involve implicit, synchronous transfers as part of processing the API calls. The glReadPixels call typically needs to pull VRAM data down to system memory, while glTexImage2D typically needs to push texture information up to VRAM.

I imagine the TexImage call could defer the transfer and execute it asynchronously as long as the next draw call confirmed that the transfer was complete before executing the draw, but in the case of ReadPixels I believe data needs to be CPU accessable as soon as the call returns.

**dungeon** · 27 October 2016, 04:57 AM

Originally posted by marek View Post

SDMA brings:
- faster TexImage and ReadPixels performance for CIK
- much better GPU offloading (PRIME) performance if the dGPU is CIK.

Better say for CIK dGPUs, since i see nothing on Kabini/Kaveri Rivers... it looks to be here and there even slighlty worse

**nhaehnle** · 27 October 2016, 06:35 AM

DCC is a color buffer compression scheme. It actually uses more memory but saves bandwidth. It is a similar idea to Z-buffer compression, CMASK for fast color clear, and FMASK for multi-sample compression.

Announcement

RadeonSI Gallium3D Re-Enables SDMA For Sea Islands, Carrizo

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment