The new performance-enhancing Mesa patch is by Kenneth Graunke and entitled i965: Implement CopyTexSubImage2D via BLORP (and use it by default).
CopyTexSubImage2D is now implemented via the BLORP rather than the BLT engine to work around limitations of the latter not being able to blit X-tiled buffers and between buffers of different tiling modes.
In terms of what this patch means for end-users of Intel hardware on Linux, the PlaneShift MMORPG game is much faster. Up to now the game has just run at around one frame per second on Intel hardware while chewing up nearly the entire CPU due to its use of Y-tiled depth buffers with CopyTexSubImage2D. This is a common issue to massively multi-player games.
Additionally, the Xonotic first person shooter with 4x MSAA anti-aliasing is now measured to be about 6.35% faster as a result of this single Mesa patch.
The patch, which adds just over 100 lines of new code to the Mesa i965 DRI driver, can be found currently on the mailing list until it reaches mainline Mesa hopefully in time for next month's Mesa 9.1 release.
The BLT engine has many limitations. Currently, it can only blit X-tiled buffers (since we don't have a kernel API to whack the BLT tiling mode register), which means all depth/stencil operations get punted to meta code, which can be very CPU-intensive.
Even if we used the BLT engine, it can't blit between buffers with different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental limitation, and the only way around that is to use BLORP.
Previously, BLORP only handled BlitFramebuffer. This patch adds an additional frontend for doing CopyTexSubImage. It also makes it the default. This is partly to increase testing and avoid hiding bugs, and partly because the BLORP path can already handle more cases. With trivial extensions, it should be able to handle everything the BLT can.
This helps PlaneShift massively, which tries to CopyTexSubImage2D between depth buffers whenever a player casts a spell. Since these are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU while delivering ~1 FPS. This is particularly bad in an MMO setting because people cast spells all the time.
It also helps Xonotic in 4X MSAA mode. At default power management settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5).
No Piglit regressions on Ivybridge. I have not tested Sandybridge.