email@example.com concerning WoW and this patchset. Please post there, I don't want to establish a Wine support forum at Phoronix.
If you want to use __GL_THREADED_OPTIMISATIONS with my patchset you don't have to apply that hack. Due to a problem unrelated to __GL_THREADED_OPTIMISATIONS I'm currently using glBufferSubData for buffer uploads instead of glMapBufferRange. Just turn of CSMT in the registry.
A good start would be to quantify the performance difference between wined3d and gallium-nine with reproducible benchmarks and then isolate where the performance difference is coming from. And that means not just "it reduces overhead, so it's faster", but something like "There's CPU-side overhead in module X, and GPU-side overhead because shader instructions A, B and C are inefficiently handled by module Y".
If it turns out that there's a fundamental advantage to a gallium state tracker, and that it's not just working around some bugs in Mesa and Wine that could be fixed with e.g. a better GLSL compiler or adding one or two focused GL extensions to support some d3d-isms better the next task is finding a stable API exposed by gallium-nine and used by Wine.
Matteo has done some testing with gallium-nine on r600g. If I remember correctly, he saw a moderate performance gain (~15% or something), but attributed most of that to insufficient optimization in the driver's GLSL compiler. I'll ask him to make sure my memory serves me right.
Damn, it took me ~ 40 minutes to compire from AUR.
An update to my previous post: We managed to make gallium-nine + r600g work with StarCraft 2, but not some other game we were actually interested in (sorry, confidential). At lowest settings the game was CPU limited and saw an increase from 60 to 100 fps (wined3d vs gallium-nine). At higher settings (GPU limited) the performance was exactly the same.
Further profiling suggested that the difference came from GLSL constant updating. That's a well-known problem. If the application needs only Shader Model 2 support or you have an Nvidia card then ARB shaders can give you quite a boost. Otherwise we hope that GL_ARB_uniform_buffer_object will help once we use it - but so far we didn't get around to implementing that. Besides constant updating Mesa spent a lot of time in some texture update function (_mesa_update_texture) when using wined3d but not gallium-nine. We did not investigate why or what could we or Mesa could do about it. A fairly new GL extension, GL_ARB_texture_storage, aims at reducing some texture management overhead, but we have to restructure our texture-surface relationship a bit before we can properly use it. That's some restructuring we'll have to do for d3d10/11 anyway.
Well i have seen some Gallium-nine videos months ago running different games and it seems to run quite well:
Nvidia GTX 670
Modern Warfare 3 <- no re-clock (is running ( i think ) at almost 1.5/10 of it's max clock speed and the game is at highest settings !!)
GTA IV <- Same as above, no re-clock :S
Crysis 2 <- No re-clock
AMD HD 5770
Crysis 2 <- dynamic power management "ON"
It has surpassed my expectations, well done.
The problem with modern cpus, is that they cannot improve per core performance fast enough to keep up with gpu performance. So anything that reduces cpu strain will be vastly important from now on. That is why AMD introduced MANTLE afterall.
So, Wine could use anything that can improve its performance. A D3D state tracker can eliminate this overhead altogether if properly coded. Although it is not multiplatform, it could provide a big boost for gallium users. Especially with modern games. Imagine games like Rome 2 total war. I am willing to bet a huge sum of money that Wine will face a tremendous challenge in trying to match its Windows performance...
did more testing and i'm yet to see any of my games to crash. although, once i saw the GPU bottleneck comment, i went and disabled double buffering in TR2013
wine-1.7.10 => min 30.9 fps, with double buffering min 25.9
command stream version => min 48.9/max 72.3, with double buffering 30.1/60
looking at reported fps when running on windows with same GPU as me, it's about 80-90% there. that's freaking awesome change.
now, question from complete n00b on directx department. is this able to be reused in DX10/11 and how much different those 2 are?