Tom Stellard, the former Google Summer of Code student who worked on ATI R300 GLSL compiler improvements and a new register allocator, has been looking into the area of Radeon OpenCL support while now being employed by AMD. However, Tom is working on other open-source Radeon work too. Recently he made improvements to the R300g driver's instruction scheduler to make better use of the texture semaphore.
As he mentioned on his personal blog late last month, "The texture semaphore is used by instructions that need to read texture data to tell the ALU to delay execution until the desired texture data has been fetched from the texture unit. Previously in the r300g compiler, all instructions were using this semaphore, so even instructions that didn't need texture data were waiting for it to be fetched. With these improvements, we are able to prefetch texture data by placing instructions that don't depend on texture data directly after texture look ups, so they execute while the data is being fetched. This should lead to some performance improvements for certain kinds of shaders. In Lightsmark, there is one shader in particular that really benefits from this optimization, and I'm getting about a 33% speed up in overall FPS, with these new changes on my RV515."
While part of the R300 Gallium3D driver, this work is only relevant to the ATI Radeon X1000 (R500) series. With his report of such huge performance gains in shader-using OpenGL workloads, such as Lightsmark, I couldn't help but to run some benchmarks as soon as returning from Oktoberfest.
The instruction scheduler enhancements for the texture semaphore have not yet been merged to Mesa master, but are currently living in Stellard's personal Git repository. His Mesa repository is on FreeDesktop.org and this work is currently living in the "tex-sem" branch, but will hopefully be merged to mainline Mesa in the near future. This texture semaphore work also hooks into a new debugging environmental variable, RADEON_TEX_GROUP. This environment variable allows manipulating the maximum number of texture look-ups to submit concurrently. The default number of texture look-ups to submit at once is eight, but Tom says the best performing number may be different depending upon the application and graphics processor.
Stellard's tex-sem branch also offers a few other improvements, such as a smarter instruction scheduler and the re-enabling of the register rename pass to enhance all compiler optimizations. It is interesting work for this open-source Gallium3D driver targeting older Radeon hardware.
For this article I compared the performance of Tom Stellard's tex-sem branch of Mesa against mainline Mesa, as of 6 October 2011. The latest Linux 3.1 kernel as of the same date was used. Via the xorg.conf, swap buffers wait was also disabled (and color tiling is already enabled by default for the R500 series).
The graphics cards tested were an ATI Radeon X1800XL, ATI Radeon X1800XT, and ATI Radeon X1950PRO. Unfortunately, last month I gave to Martin Graesslin (the KDE KWin maintainer) the X1300PRO graphics card as he doesn't have any R300/400/500 class hardware and he's working to debug some R300g driver issues with the KWin compositing window manager, so this instruction scheduler testing is limited to just three higher-end R500 GPUs.