How Valve Made L4D2 Faster On Linux Than Windows

Valve's big SIGGRAPH Linux presentation centered around their OpenGL work and how the Linux version is faster than the Windows version -- for both the OpenGL and Direct3D renderers.
Valve is supposed to be posting the slides (presumably on their blog) and hopefully a video will emerge, but since there's nothing yet, it will probably take until next week before they share anything. As a result, below are the pertinent details they shared from their slides that were presented last night at SIGGRAPH Los Angeles. With SIGGRAPH being an industry-leading graphics conference, most of the talk was filled with low-level technical details. Enjoy!
- The presentation was made by Rich Geldreich of Valve and entitled "Left 4 Dead 2 Linux: From 6 to 300 FPS in OpenGL."
- The Source Engine with OpenGL is on average about 11% faster with OpenGL than Direct3D 9 on a NVIDIA GeForce GTX 680. It's believed that another 5% higher performance for GL is still easily obtainable by reducing overhead in their Direct3D -> OpenGL layer.
- Yes, the way the Source Engine is hitting on OpenGL right now is through a non-deferring, locally-optimizing abstraction layer to basically convert their longstanding Direct3D calls into OpenGL. However, it's not the same way that Wine does Direct3D to OpenGL conversion. The Source Engine targets a D3D9-like API with extensions that translates GL calls dynamically. This also works for Shader Model 2.0b with Shader Model 3.0 support coming soon. Valve's implementation is nearly a 1:1 mapping between D3D and GL concepts.
- The overhead attributed to the Direct3D to OpenGL translation is about 50/50 split between CPU cycles spent calling GL vs. translation overhead. For single-threaded graphics drivers on the other hand, it's about 80% in GL and 20% translation overhead. But again, even with this extra layer, OpenGL is faster. NVIDIA's proprietary Linux team has done a lot of work with their driver's multi-threading abilities.
- Valve's worked with all major vendors (Intel, AMD, and NVIDIA) for improved driver support and optimizations. Valve's Linux team originally had "little practical OpenGL experience." Their process came down to devising/conducting experiments, test results with known workloads, refining/updating mental model of system's behavior, repeat. The goal was to account for every micro-second spent in the Direct3D to OpenGL layer and render thread.
- Interpreting the experimental results were a bit challenging with the game being multi-threaded, the driver's server thread is invisible to most profiling tool, and the Source Engine is extremely configurable/scalable.
- RAD Game Tools' Telemetry was used a lot plus a custom batch trace recording mode for analyzing their translation layer. Telemtry offers cross-platform performance visualization systems via a visualizer app, run-time component, and server.
- Some of Valve's optimizations made so far is multi-threading support with the GL mode, removing most calls to glXMakeCurrent, pthreads usage fixes, reducing translation overhead by rewriting the hottest D3D->GL code paths, improved dirty range tracking, added separate uniform array for bone matrices, dynamic buffer updating improvements, and compiler optimizations. The compiler improvements were building the game/engine with -ffast-math and removing -fPIC.
- Extra details not covered during the presentation will be shared via their Linux blog in the coming days.
103 Comments