Marek Continues Improving Radeon Performance
Marek Olšák, the student developer from Europe who's independently made significant contributions to Mesa/Gallium3D and particularly the open-source AMD Radeon graphics drivers, is continuing to do more. Last week he worked out two more performance patches to try to better the open-source driver's performance against the AMD Catalyst proprietary driver following some disappointing performance results in a Phoronix article. Last week he also enabled 2D color tiling for the more recent Radeon graphics hardware on this open-source driver, another performance win.
Pushed to Mesa's mainline Git repository last night was a new patch by Marek that adds in-place depth buffer de-compression and texturing with the depth buffer tiling. His patch explains:
The decompression is done in-place and only the compressed tiles are decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.Saving on memory while also being able to improve the performance a bit is certainly much appreciated.
The texture unit is programmed to use non-displayable tiling and depth ordering of samples, so that it can fetch the texture in the native DB format.
The latest version of the libdrm surface allocator is required for stencil texturing to work. The old one didn't create the mipmap tree correctly. We need a separate mipmap tree for stencil, because the stencil mipmap offsets are not really depth offsets/4.
There are still some known bugs, but this should save some memory and it also improves performance a little bit in Lightsmark (especially with low resolutions; tested with Radeon HD 5000).
The Radeon Mesa support now requires libdrm 2.4.40, which was released yesterday, for the stencil mip-map allocator for combined depth-stencil buffers.
Some might also be interested in comments Marek made recently within the forums where he says, "we're fighting a battle we can't win", in terms of competing with the Catalyst driver offerings on performance.
I expected worse results after seeing the bug report about Unigine Heaven. Anyway, we don't have many options at the moment (I see only one: reverting the commit). The mechanism that decides where buffers are placed (VRAM or GTT) and which buffers are moved when we start to run out of memory must be overhauled. This is a bigger project and I don't have time for it right now. The kernel DRM interface might need some changes. We also need good tools to detect bottlenecks and a good GPU resource monitor. Right now if you run out of GPU memory, there's no easy way to know and definitely no way to know what is eating the memory. We're mostly blind right now.
However, we're fighting a battle we can't win. S3TC textures need 4x to 8x less memory and would help a lot with this problem. Any driver with S3TC support has a great advantage over a driver without one.
We could also cheat by using the BC7 format for plain RGBA8 textures. That would be a win if we implemented the BC7 encoding on the GPU.