Originally posted by schmidtbag
View Post
Announcement
Collapse
No announcement yet.
The Linux Kernel's Scheduler Apparently Causing Issues For Google Stadia Game Developers
Collapse
X
-
-
Originally posted by xorbe View PostI read the whole linked article with interest, and it was a pretty good examination. Would be interesting to know the technical root cause. Perhaps Linux yield() actually blocks a thread from being activated again for a timeslice even if there's nothing to run. There's a difference between "go do something else and come back to me" and "I don't have anything to do."
- Likes 2
Leave a comment:
-
Originally posted by PuckPoltergeist View PostI've had a look and on my system I've more than 40 kworker threads. And this is not even the half of all kernel threads. So yes, there are always way more threads that will fight for CPUs.
Leave a comment:
-
Seems like the implementation of this program evolved on the Windows kernel, and now the developers discover that the Linux kernel does not behave the same way. All scheduler algorithms represent some compromise; in the case of the Windows kernel, low latency seems to be achieved at the expense of poor efficiency for low priority threads. This is fine for games where the scheduling for only a handful of threads matters, but disappointing for a busy software developer's desktop.
Most software developers are loathe to rearchitecht threading, but in doing so it is often possible to reduce the sensitivity of your app to thread scheduling.
I have written a couple of animating programs on Linux, and a glitch-free solution I found was to have one high priority thread (call it the animator thread) blit all components of the next frame to the offscreen buffer. When this operation is complete, the thread sleeps until right before the graphics card indicates VBLANK. Now the animator thread wakes and polls for the VBLANK to begin, then performs the screen buffer switch and restarts the animation cycle by generating the next frame in the off-screen buffer.
Other lower priority threads are used to prepare the various components of the next frame, which will be subsequently aggregated by the animator thread.Last edited by g04th3rd3r; 02 January 2020, 05:27 PM.
- Likes 3
Leave a comment:
-
Well I hope they don't switch the backend of stadia and move from Linux to some Windows crap.
Hopefully open source wins this problem over in the end.
Does sound like developers are struggling to understand howto with Stadia however.
IMO its googles fault for not having a 6-12month public trial run with Stadia, maybe a free trial option where you can freely play any 1 game a month or something. The whole platform needs time to bake in the sun.
- Likes 3
Leave a comment:
-
I read the whole linked article with interest, and it was a pretty good examination. Would be interesting to know the technical root cause. Perhaps Linux yield() actually blocks a thread from being activated again for a timeslice even if there's nothing to run. There's a difference between "go do something else and come back to me" and "I don't have anything to do."
- Likes 1
Leave a comment:
-
Originally posted by PuckPoltergeist View Post
Or the process was simply preemted by the OS. Yes, there may be other processes ready to run and don't bother about this spinlock. The process, that tries to get the lock, will sleep nevertheless. So yes, a design-bug. Userspace-spinlocks can't work!
For others in this thread that have no experience with programming such things the specific problem here with using a spin-lock is that the thread spent all it's allotted schedule time spinning on the lock while it was locked by another thread, if a mutex had been used instead then the thread would have been put to sleep immediately and waken immediately (or close to) by the kernel when the other thread released the lock.
What I wonder is why this behaviour happens to work on the Windows scheduler, I know that WIndows have priority boosting where threads of an "active" process have their priority increased and perhaps this is what prevents his thread from being preemted here.
Or the quanta is larger/different on Windows so that with the timings in this particular scenario his waiting thread is back from preemting at the time of the release of the lock.
In his benchmark data he sees a maximum of 0.x ms idle time on Windows for his userspace spinlock which should be impossible with normal preemptive scheduling anyway unless the thread have 100% exclusivity of the cpu core so there is something very strange going on here.
edit: could he be running his Ubuntu in a VM like VirtualBox perhaps?Last edited by F.Ultra; 02 January 2020, 03:46 PM.
- Likes 1
Leave a comment:
-
Well, don't want to sound like a broken record, but Shadow of the Tomb Raider port by Feral heavily utilizes multiple threads which are dependent on each other.
When I force low resolution and GPU effects to be entirely CPU bound in an NPC crowd (i.e. CPU never waits for GPU), I get 90% total CPU usage reported on the 6700k (four cores + HTT, so eight threads almost fully utilized).
Windows DX12: ~102fps
Linux Vulkan port: ~94fps
Also frametime consistency is absolutely comparable, no noticeable spikes/stalls (the game has some mostly minor stalling issue when loading certain new areas, happens on Windows and Linux the same).
So, how bad can the issue actually be if the developer knows what he/she's doing?
- Likes 1
Leave a comment:
-
Originally posted by davibu View PostWhat I really don't understand is people writting "I have 40 working threads on a quad core and everything is fine", where as this focuses on a single application which has multiple concurrent threads and works in realtime, which is a whole other type of problem, then having a few worker threads which for a non realtime problem...
- Likes 4
Leave a comment:
-
I really don't get some people here.
If this spinlock feature is slow it should be optimized, especially if windows is able to work faster with spinlocks, it should be obvious that there is headroom to improve.
What I really don't understand is people writting "I have 40 working threads on a quad core and everything is fine", where as this focuses on a single application which has multiple concurrent threads and works in realtime, which is a whole other type of problem, then having a few worker threads which for a non realtime problem...
Even if your problem is a realtime problem and has concurrent threads, the question still remains, is your scheduler a non-neglectable bottleneck. If it is, yes your problem is comparble, elsewise it isn't.
- Likes 2
Leave a comment:
Leave a comment: