RadeonSI Picks Up Primitive Culling With Async Compute For Performance Wins
The 26 patches allow for using async compute to do primitive culling before the vertex shader process. This work ends up yielding performance improvements for workloads that do a lot of geometry that ends up being invisible. This code is stable and passing nearly all conformance tests while working from GCN 1.1 through Radeon VII.
Marek provided some results using the ParaView workstation software we often use for benchmarking. He commented, "As you can see in the results marked (ENABLED) in the picture below, it destroys our competition (The GeForce results are from a Phoronix article from 2017, the latest ones I could find):"
For now this optimization is enabled for all professional/workstation graphics cards as Marek hasn't had the time to benchmark games. But depending upon feedback he might enable the code for all GPUs and also possible per-game whitelisting if this ends up hurting some titles.
The patch series can be found here. It looks like I have some more interesting benchmarks to add to my TODO list this week! Yes, will look at the gaming performance and more. Even if this ends up being just relevant to workstation workloads, this is great still for helping to increase the attractiveness of Mesa to those users who traditionally use the Radeon Software "PRO" OpenGL driver.