Marek Olšák Lands Support In Mesa 24.2 To Vectorize IO In The GLSL Linker
Well known AMD Mesa developer Marek Olšák has shown no signs of hitting the end of the road for optimizing OpenGL support within the Mesa/Gallium3D driver stack. More than one decade since joining AMD and more than a decade and a half of being involved with Mesa since beginning as a student developer, Marek still isn't slowing down with his performance optimizations and new features to benefit the open-source Radeon Linux graphics drivers.
The latest work being merged today for next quarter's Mesa 24.2 release is the ability to vectorize IO within the GLSL linker. This is done as part of the common NIR and GLSL code within Mesa. Marek explained in the merge request:
No performance metrics were shared as part of the GLSL linker change to (re)vectorize the lowered IO. Those interested can find all the details within this merge request with the 600+ lines of new code now within Mesa 24.2-devel.
The latest work being merged today for next quarter's Mesa 24.2 release is the ability to vectorize IO within the GLSL linker. This is done as part of the common NIR and GLSL code within Mesa. Marek explained in the merge request:
We generally want vectorized IO coming out of the GLSL linker. Since the linker scalarizes for nir_opt_varyings, it had no way to re-vectorize IO until now.
nir_opt_vectorize_io:
- vectorizes lowered input/output loads and stores
- vectorizes low and high 16-bit loads and stores by merging them into a single 32-bit load or store (except load_interpolated_input, which has to keep bit_size=16)
- performs DCE of output stores that overwrite the previous value by writing into the same slot and component.
Vectorization is only local within basic blocks. No vectorization occurs across basic block boundaries, barriers (only TCS outputs), emits (only GS outputs), and output load <-> output store dependencies.
All loads and stores must be scalar before the pass. 64-bit loads and stores are forbidden.
For each basic block, the time complexity is O(n*log(n)) where n is the number of IO instructions within that block.
No performance metrics were shared as part of the GLSL linker change to (re)vectorize the lowered IO. Those interested can find all the details within this merge request with the 600+ lines of new code now within Mesa 24.2-devel.
1 Comment