Earlier this week AMD released code to support the a-synchronous DMA engines. This support goes from the Radeon HD 2000 series up through the very latest Radeon HD 7000 series graphics hardware. The DRM side of this code will be merged into the Linux 3.8 kernel while other user-space code is still forthcoming.
Christian König of AMD described the a-sync DMA code as "The async DMA can do copy/moves independent of the shader engine. So while the shader part of the GPU is busy with the rendering we can still upload new data with the DMA at the same time. Additional to that it is quite a bit more efficient than the shader engine when you just want to copy some data from A to B, or just clear a specific region of memory (memcpy/memset). So using it should result in some very nice performance improvements for certain use cases, but what Alex has released is just kernel part of the implementation, and even that is missing the CS checker. It will probably just take some more time till mesa really picks that up."
Aaron Watry, the Phoronix reader who carried out the simple Radeon Gallium3D sub-allocator tests using the Phoronix Test Suite and OpenBenchmarking.org, has now begun tests on the Radeon DRM code for this a-sync DMA engine support.
Using an AMD Radeon HD 6850 graphics card on the open-source Linux graphics driver stack, Aaron is reporting nearly a 10x improvement in performance for Unigine Heaven as a result of this new code for Linux 3.8.
He notes these early results in this forum post and says he will be conducting additional tests soon. For those wanting to see the initial performance data of this new code from AMD, the data is within the async_dma_tests result file.