There's been another improvement to Mesa with the Radeon Gallium3D R600 driver by Marek Olšák that can improve the OpenGL performance in certain situations for this open-source AMD Linux driver while also conserving memory usage.
Marek Olšák, the student developer from Europe who's independently made significant contributions to Mesa/Gallium3D and particularly the open-source AMD Radeon graphics drivers, is continuing to do more. Last week he worked out two more performance patches
to try to better the open-source driver's performance against the AMD Catalyst proprietary driver following some disappointing performance results
in a Phoronix article. Last week he also enabled 2D color tiling
for the more recent Radeon graphics hardware on this open-source driver, another performance win.
Pushed to Mesa's mainline Git repository last night was a new patch by Marek that adds in-place depth buffer de-compression and texturing with the depth buffer tiling. His patch
The decompression is done in-place and only the compressed tiles are decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.
The texture unit is programmed to use non-displayable tiling and depth ordering of samples, so that it can fetch the texture in the native DB format.
The latest version of the libdrm surface allocator is required for stencil texturing to work. The old one didn't create the mipmap tree correctly. We need a separate mipmap tree for stencil, because the stencil mipmap offsets are not really depth offsets/4.
There are still some known bugs, but this should save some memory and it also improves performance a little bit in Lightsmark (especially with low resolutions; tested with Radeon HD 5000).
Saving on memory while also being able to improve the performance a bit is certainly much appreciated.
The Radeon Mesa support now requires libdrm 2.4.40, which was released yesterday, for the stencil mip-map allocator for combined depth-stencil buffers.
Some might also be interested in comments Marek made recently within the forums
where he says, "we're fighting a battle we can't win", in terms of competing with the Catalyst driver offerings on performance.
I expected worse results after seeing the bug report about Unigine Heaven. Anyway, we don't have many options at the moment (I see only one: reverting the commit). The mechanism that decides where buffers are placed (VRAM or GTT) and which buffers are moved when we start to run out of memory must be overhauled. This is a bigger project and I don't have time for it right now. The kernel DRM interface might need some changes. We also need good tools to detect bottlenecks and a good GPU resource monitor. Right now if you run out of GPU memory, there's no easy way to know and definitely no way to know what is eating the memory. We're mostly blind right now.
However, we're fighting a battle we can't win. S3TC textures need 4x to 8x less memory and would help a lot with this problem. Any driver with S3TC support has a great advantage over a driver without one.
We could also cheat by using the BC7 format for plain RGBA8 textures. That would be a win if we implemented the BC7 encoding on the GPU.