Blumenkrantz Optimizes Mesa Vulkan Submission Merging - Some Test Cases Improve 1000%+
Mike Blumenkrantz, who is part of Valve's stellar Linux graphics driver team, has managed another impressive feat of further optimizing the Mesa Vulkan driver code that benefits multiple drivers / hardware vendors.
Blumenkrantz recently took to exploring Mesa's handling of Vulkan queues and how it can end up being quite slow. What he ended up analyzing was around how the Vulkan API allows performing an array of command buffer submissions to happen all at once albeit by Mesa's current vkQueueSubmit handling would split the batched submits up and submit each one still individually rather than submitting at once the entire array of submissions. In turn this current means of individual submission means increased overhead and also increasing memory allocation overhead.
By reworking the Vulkan queue submission code, adding threaded waits, and merging queue submissions where possible to reduce synchronization overhead, it's a big efficiency win. Making it all the more exciting is that this happens within Mesa's common Vulkan run-time code and not driver specific.
With the pending improvements in some vkOverhead benchmark test cases the RADV driver on RDNA3 GPUs can be ~1000% faster for command submission, the Lavapipe software driver saw gains in some cases of 1000~3000%, the Intel ANV Vulkan driver with Arc Graphics (DG2) saw up to 5000% faster, and the Qualcomm Adreno TURNIP driver also saw some 3000~4000% faster marks.
The code with to optimize submission merging for the Vulkan Mesa drivers is now under review via this Mesa MR. More details on the relentless optimizing work by Blumenkrantz can be found via this blog post.
Blumenkrantz recently took to exploring Mesa's handling of Vulkan queues and how it can end up being quite slow. What he ended up analyzing was around how the Vulkan API allows performing an array of command buffer submissions to happen all at once albeit by Mesa's current vkQueueSubmit handling would split the batched submits up and submit each one still individually rather than submitting at once the entire array of submissions. In turn this current means of individual submission means increased overhead and also increasing memory allocation overhead.
By reworking the Vulkan queue submission code, adding threaded waits, and merging queue submissions where possible to reduce synchronization overhead, it's a big efficiency win. Making it all the more exciting is that this happens within Mesa's common Vulkan run-time code and not driver specific.
With the pending improvements in some vkOverhead benchmark test cases the RADV driver on RDNA3 GPUs can be ~1000% faster for command submission, the Lavapipe software driver saw gains in some cases of 1000~3000%, the Intel ANV Vulkan driver with Arc Graphics (DG2) saw up to 5000% faster, and the Qualcomm Adreno TURNIP driver also saw some 3000~4000% faster marks.
The code with to optimize submission merging for the Vulkan Mesa drivers is now under review via this Mesa MR. More details on the relentless optimizing work by Blumenkrantz can be found via this blog post.
19 Comments