Originally posted by indepe
View Post
Announcement
Collapse
No announcement yet.
Mesa Threaded OpenGL Dispatch Finally Landing, Big Perf Win For Some Games
Collapse
X
-
Originally posted by funfunctor View Post
This has got nothing to do with anything else but GL specifically.
Of course, you were correct to point out that the single-threaded nature of OpenGL does not generally apply to GPUs, but is specific to the OpenGL API itself.Last edited by indepe; 06 February 2017, 03:24 AM.
- Likes 2
Comment
-
i did some test, for many apps (csgo, glxgears, gputest) mesa_glthread does not work
Code:_mesa_glthread_init _mesa_glthread_destroy
Talos Ultra ren_bMultiThreadedRendering=1 ren_bMultiThreadedRendering=0 mesa_glthread=true 44,5\61,4\33,9 43,9\60,4\32,6 mesa_glthread=false 45,5\62,1\33,3 43,7\59,9\32,4
- Likes 1
Comment
-
Originally posted by indepe View Post
I know. What I am saying is that, in theory, instead of "Stash the GL calls in a batchbuffer" you could also "Stash the Vulkan calls in a batchbuffer" or "Stash the printf calls in a batchbuffer". Actually, the latter is something I was planning to do (or at least try out) the coming week, even though printf is not a single-threaded API. (Or at least the printf implementation which I am using for debugging purposes, doesn't mix output from concurrent calls.)
Of course, you were correct to point out that the single-threaded nature of OpenGL does not generally apply to GPUs, but is specific to the OpenGL API itself.
Comment
-
Originally posted by funfunctor View Post
Ah I see where your miss understanding is coming from now. Right so, with Vk you don't need to do that as Vk already has things like command buffers where you can configure as many concurrent streams as you like and handle all that yourself. This is what is meant by Vk being "low level" in that you as the user then become responsible for setting up the threads, buffers and whatever else, packing them with data and sending them on their way. Hope this helps without being too technical, let me know if you still don't understand and I can explain more deeply..
Comment
-
In many cases, it often seems that the open source AMD drivers are bottlenecked by the CPU. I could see how this feature may substantially improve performance, particularly in games that use post-processing.
Originally posted by atomsymbolIn my opinion, glthread (if enabled from command-line) will be slowing down a large number of OpenGL apps this year (2017) and this issue won't be resolved until year 2018+.
On-disk cache is much closer to being capable of working as expected/intended in year 2017 than glthread (mareko).
Comment
-
Originally posted by atomsymbolWell. I compiled https://cgit.freedesktop.org/~mareko/mesa/?h=glthread, run a game and observed a performance decrease by up to 60%. I am not claiming that glthread doesn't benefit some other games.
In general, the only thing that decides whether multi-threaded code performing a task is faster than single-threaded code performing the same task is whether it is computable beforehand/in_advance that the former is faster than the latter. If it cannot be computed it is faster it may just as well be slower.Last edited by schmidtbag; 06 February 2017, 12:34 PM.
Comment
-
Originally posted by schmidtbag View PostHave you narrowed down that glthread was the issue?
Understood, but this is why I think there would either be a negligible performance improvement or a drastic improvement. The article states the the GL calls are queued, implying that even though the threads are processed in parallel, they're still meant to be executed in a specific order and doesn't imply they're dependent on each other. In other words, if each GL call was put into multiple threads and they were all meant to complete a single image, then there could be drastic performance decreases, because all threads are working toward a single task. That being said, if one of the threads wasn't done, all of the others have to wait for it, which in turn hurts performance. But since they're queued, that suggests the threads are not explicitly dependent on each other, in which case there should be little to no decrease in performance, but an increase would be dependent on how much a each CPU core is bottlenecked.
Here a feedback of my quick test
On PCSX2 (PS2 emulator), I noticed that synchronization badly impacts the perf. In my case, there are mostly related to texture transfer (CPU->GPU) and clear buffer functions. Strangely I didn't notice anything related to BufferSubData* but I guess it is the same. Those functions trigger a sync because of the pointer parameter. However texture transfer could use a PBO so it isn't a real pointer. And clear uses a pointer to a color hence a small payload (worst case is likely around 16/32B). IMHO, it can surely be inlined/memcpy in the gl dispatcher (otherwise the old GL2 clear API is sync free). I hacked the code to remove the sync on texture transfer and I got a major speed boost. I didn't count the number of draw call neither sync ratio. But I suspect that perf impact could depends on the sync repartition. Unlike me, I guess that Borderlands2 uploads/clears buffers/textures/uniform at the start of the frame. Which mean various small sync at the start of the frame (which might be optimized as a spin lock). Therefore the hot rendering loop might be sync free hence the speed boost. To conclude, based on my single testcase, current state of the code isn't yet optimal and it might explain why few apps see any perf improvement so far. But the potential is here.
Comment
Comment