Announcement

**efikkan** · 31 May 2012, 02:40 PM

Originally posted by entropy View Post

IIRC, elanthis mentioned several times the limitations of OpenGL (being a state machine) concerning proper multithreading.

OpenGL implementations for both Windows and Linux support multiple contexts and context sharing, like the features of Direct3D. The drivers are however single threaded so OpenGL has no disadvantage here. Some news was posted some time ago regarding an nVidia patent for multiple streams to the GPU, so this might show up in future generations (post Maxwell). Anyway, how is the lack of "proper" multithreading slowing down the performance of applications? Do you really think 8 CPU cores could boost the performance of your GPU? When executing calls to the GPU, either in OpenGL or Direct3D, there are two types of calls: load and render. Load calls are bandwidth limited, so multiple threads executing load calls would not help the performance. For a single viewport, multiple render threads would not help much either. Modern style, optimized SM4+ code should be GPU performance limited, so multiple threads would not give any speedup here. So you are thinking, I want one thread for rendering and one for loading. Ideally then your performance speedup would equal the amount of GPU idle time saved. Depending on usage, this might give around 10-20% theoretical speedup. But it has one condition, the data your loading thread are modifying cannot be in use by the rendering thread. However, GK110 and CUDA5 introduces direct streaming to the GPU without the CPU. Hopefully this would be included in future OpenCL specifications.

**sylware** · 31 May 2012, 02:57 PM

Originally posted by efikkan View Post

OpenGL implementations for both Windows and Linux support multiple contexts and context sharing, like the features of Direct3D. The drivers are however single threaded so OpenGL has no disadvantage here. Some news was posted some time ago regarding an nVidia patent for multiple streams to the GPU, so this might show up in future generations (post Maxwell). Anyway, how is the lack of "proper" multithreading slowing down the performance of applications? Do you really think 8 CPU cores could boost the performance of your GPU? When executing calls to the GPU, either in OpenGL or Direct3D, there are two types of calls: load and render. Load calls are bandwidth limited, so multiple threads executing load calls would not help the performance. For a single viewport, multiple render threads would not help much either. Modern style, optimized SM4+ code should be GPU performance limited, so multiple threads would not give any speedup here. So you are thinking, I want one thread for rendering and one for loading. Ideally then your performance speedup would equal the amount of GPU idle time saved. Depending on usage, this might give around 10-20% theoretical speedup. But it has one condition, the data your loading thread are modifying cannot be in use by the rendering thread. However, GK110 and CUDA5 introduces direct streaming to the GPU without the CPU. Hopefully this would be included in future OpenCL specifications.

Well... I understood that you must be very carefull with the render task and the load task. Indeed, you must keep the vram bandwidth and the caches for the render to be the most efficient. Loading without the CPU means DMA, with shader programs and/or discret DMA engines... but the PCI-E controller on the GPU board would perform write memory requests and then disturb the scarse vram bandwidth and caches of the render task.

**entropy** · 31 May 2012, 03:09 PM

Originally posted by efikkan View Post

OpenGL implementations for both Windows and Linux support multiple contexts and context sharing, like the features of Direct3D. The drivers are however single threaded so OpenGL has no disadvantage here. Some news was posted some time ago regarding an nVidia patent for multiple streams to the GPU, so this might show up in future generations (post Maxwell). Anyway, how is the lack of "proper" multithreading slowing down the performance of applications? Do you really think 8 CPU cores could boost the performance of your GPU? When executing calls to the GPU, either in OpenGL or Direct3D, there are two types of calls: load and render. Load calls are bandwidth limited, so multiple threads executing load calls would not help the performance. For a single viewport, multiple render threads would not help much either. Modern style, optimized SM4+ code should be GPU performance limited, so multiple threads would not give any speedup here. So you are thinking, I want one thread for rendering and one for loading. Ideally then your performance speedup would equal the amount of GPU idle time saved. Depending on usage, this might give around 10-20% theoretical speedup. But it has one condition, the data your loading thread are modifying cannot be in use by the rendering thread. However, GK110 and CUDA5 introduces direct streaming to the GPU without the CPU. Hopefully this would be included in future OpenCL specifications.

I shouldn't write in elanthis' behalf.
My technical knowledge concerning this topic is rather limited.

Nevertheless, this is what elanthis wrote in the forum.

Unigine Engine Looks To Wasteland 2 - Phoronix Forums

http://phoronix.com/forums/showthread.php?70380-Unigine-Engine-Looks-To-Wasteland-2&p=259004#post259004

Gaming on Linux with both open and closed-source games.

Originally posted by elanthis View Post

The core code is likely identical in terms of acceleration.

The differences between GL and D3D for performance stem from GL's state model, the difficulty of threading it properly, and the extra checks that have to be run on many object uses because of the highly mutable object model. The latter at least is being slowly fixed with each successive version of GL (e.g., ARB_texture_storage), but has a long way to go. The threading problem cannot be fixed without literally scrapping and redesigning the API, as the fundamental problem is that the API expects and requires magic global hidden state (which can be thread-local, but that is not free), and in the short term requires scrapping and redesigning WGL and GLX (the changes from GL3 made it much better, but still far from perfect). The GL state model is just utter shit and needs to be shot in the face six times with a high-powered rifle; there's no fixing it, simply throwing it away and starting over. The API is simply trash, and even Khronos knows that fact (hence the Longs Peak fiasco). They just aren't willing to do anything about it; they introduce things like Core profile that break back-compat in little minor ways that barely affects anything at all while refusing to just introduce a revised API that breaks things in larger but actually useful ways.

The biggest problem with GL as an app developer is that -- on Windows -- the drivers are simply buggy and unstable. I still run into frequent driver crashes or just crazy performance problems that are simply bugs. The problems usually get fixed (though a few really bad long-term bugs haven't been fixed even after two years on NVIDIA's drivers) eventually, but the releases that fix one set of bugs inevitably just cause more.

Don't even get me started on what a horrifically bad shading language GLSL is, either. It's only just becoming sane with GLSL 4.20, which means you can't actually use any of its features since most of us need to target GL 3.1 hardware (Intel) or GL 3.2 operating systems (OS X) or just stick to GLSL|ES 2.0 (iOS, Android, NativeClient).

**efikkan** · 31 May 2012, 03:15 PM

I have found, that in most cases games are not hitting a vram bandwidth bottleneck. This is relatively easy to check by overclocking the memory bus and checking the impact on the overall performance. Hint: check the relation between processing performance and memory bandwidth on the previous generation GPUs from nVidia.

BTW: To me the core profile could have been removed. There is no performance gain in it. One mobile profile(ES) and a full profile should be enough.

Announcement

OpenGL ES 3.0 Will Be Here This Summer

Comment

Comment

Comment

Comment