RadeonSI Primitive Culling Yields Mixed Benchmark Results
Yesterday's patches introducing RadeonSI primitive culling via async compute yielded promising initial results, at least for the ParaView workstation application. I've been running some tests of this new functionality since yesterday and have some initial results to share on Polaris and Vega.
I've been running tests using a Radeon RX 590 and RX Vega 64 graphics cards. Tests were run with the latest Mesa Git branch of Marek's that provides this primitive culling implementation. That Mesa version was built against LLVM 9.0 SVN, which is a requirement otherwise the very latest LLVM 8.0 release state otherwise this functionality will not work. Additionally, it depends upon the AMDGPU DRM-Next material in the kernel as well so I was running a fresh kernel build off Alex Deucher's latest code branch.
While Marek's early results were quite enticing, unfortunately, at least with my tests so far the results have been more minimal... That's after also confirming the primitive culling bits are indeed being enabled.
ParaView indeed was a significant benefactor to this code, which for non-Pro GPUs needs to be enabled via the AMD_DEBUG=pd environment variable.
But for OpenGL games tested at least with this batch of hardware, there wasn't any real performance changes to note.
For Mad Max and Unigine, if anything, the performance was trending lower.
The synthetic Plot3D test saw a big performance hit with the feature enabled.
So it looks like this primitive culling feature for RadeonSI is still very much a work in progress. The results of others via the Mesa mailing list and in the forums have also been mixed. It also looks like Marek is also working on some follow-up improvements already to this feature, so hopefully it will improve the state. As it matures, I'll do some more testing.