Announcement

Collapse
No announcement yet.

The Linux Kernel's Scheduler Apparently Causing Issues For Google Stadia Game Developers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    I really don't get some people here.
    If this spinlock feature is slow it should be optimized, especially if windows is able to work faster with spinlocks, it should be obvious that there is headroom to improve.

    What I really don't understand is people writting "I have 40 working threads on a quad core and everything is fine", where as this focuses on a single application which has multiple concurrent threads and works in realtime, which is a whole other type of problem, then having a few worker threads which for a non realtime problem...
    Even if your problem is a realtime problem and has concurrent threads, the question still remains, is your scheduler a non-neglectable bottleneck. If it is, yes your problem is comparble, elsewise it isn't.

    Comment


    • #42
      Originally posted by davibu View Post
      What I really don't understand is people writting "I have 40 working threads on a quad core and everything is fine", where as this focuses on a single application which has multiple concurrent threads and works in realtime, which is a whole other type of problem, then having a few worker threads which for a non realtime problem...
      Sure, if you can't read, you won't understand. Read the postings, read the blog and come back with specific questions. Your post is simply some more bullshit.

      Comment


      • #43
        Well, don't want to sound like a broken record, but Shadow of the Tomb Raider port by Feral heavily utilizes multiple threads which are dependent on each other.
        When I force low resolution and GPU effects to be entirely CPU bound in an NPC crowd (i.e. CPU never waits for GPU), I get 90% total CPU usage reported on the 6700k (four cores + HTT, so eight threads almost fully utilized).
        Windows DX12: ~102fps
        Linux Vulkan port: ~94fps
        Also frametime consistency is absolutely comparable, no noticeable spikes/stalls (the game has some mostly minor stalling issue when loading certain new areas, happens on Windows and Linux the same).

        So, how bad can the issue actually be if the developer knows what he/she's doing?

        Comment


        • #44
          Originally posted by PuckPoltergeist View Post

          Or the process was simply preemted by the OS. Yes, there may be other processes ready to run and don't bother about this spinlock. The process, that tries to get the lock, will sleep nevertheless. So yes, a design-bug. Userspace-spinlocks can't work!
          Most definetely it was preemted considering the description.

          For others in this thread that have no experience with programming such things the specific problem here with using a spin-lock is that the thread spent all it's allotted schedule time spinning on the lock while it was locked by another thread, if a mutex had been used instead then the thread would have been put to sleep immediately and waken immediately (or close to) by the kernel when the other thread released the lock.

          What I wonder is why this behaviour happens to work on the Windows scheduler, I know that WIndows have priority boosting where threads of an "active" process have their priority increased and perhaps this is what prevents his thread from being preemted here.

          Or the quanta is larger/different on Windows so that with the timings in this particular scenario his waiting thread is back from preemting at the time of the release of the lock.

          In his benchmark data he sees a maximum of 0.x ms idle time on Windows for his userspace spinlock which should be impossible with normal preemptive scheduling anyway unless the thread have 100% exclusivity of the cpu core so there is something very strange going on here.

          edit: could he be running his Ubuntu in a VM like VirtualBox perhaps?
          Last edited by F.Ultra; 01-02-2020, 03:46 PM.

          Comment


          • #45
            I read the whole linked article with interest, and it was a pretty good examination. Would be interesting to know the technical root cause. Perhaps Linux yield() actually blocks a thread from being activated again for a timeslice even if there's nothing to run. There's a difference between "go do something else and come back to me" and "I don't have anything to do."

            Comment


            • #46
              Well I hope they don't switch the backend of stadia and move from Linux to some Windows crap.
              Hopefully open source wins this problem over in the end.
              Does sound like developers are struggling to understand howto with Stadia however.

              IMO its googles fault for not having a 6-12month public trial run with Stadia, maybe a free trial option where you can freely play any 1 game a month or something. The whole platform needs time to bake in the sun.

              Comment


              • #47
                Seems like the implementation of this program evolved on the Windows kernel, and now the developers discover that the Linux kernel does not behave the same way. All scheduler algorithms represent some compromise; in the case of the Windows kernel, low latency seems to be achieved at the expense of poor efficiency for low priority threads. This is fine for games where the scheduling for only a handful of threads matters, but disappointing for a busy software developer's desktop.

                Most software developers are loathe to rearchitecht threading, but in doing so it is often possible to reduce the sensitivity of your app to thread scheduling.

                I have written a couple of animating programs on Linux, and a glitch-free solution I found was to have one high priority thread (call it the animator thread) blit all components of the next frame to the offscreen buffer. When this operation is complete, the thread sleeps until right before the graphics card indicates VBLANK. Now the animator thread wakes and polls for the VBLANK to begin, then performs the screen buffer switch and restarts the animation cycle by generating the next frame in the off-screen buffer.

                Other lower priority threads are used to prepare the various components of the next frame, which will be subsequently aggregated by the animator thread.
                Last edited by g04th3rd3r; 01-02-2020, 05:27 PM.

                Comment


                • #48
                  Originally posted by PuckPoltergeist View Post
                  I've had a look and on my system I've more than 40 kworker threads. And this is not even the half of all kernel threads. So yes, there are always way more threads that will fight for CPUs.
                  And how many of those 40 threads actually have any work to do 95% of the time?

                  Comment


                  • #49
                    Originally posted by xorbe View Post
                    I read the whole linked article with interest, and it was a pretty good examination. Would be interesting to know the technical root cause. Perhaps Linux yield() actually blocks a thread from being activated again for a timeslice even if there's nothing to run. There's a difference between "go do something else and come back to me" and "I don't have anything to do."
                    sched_yield() on Linux basically does return() if there are no other thread to run right now. I've used it extensively before in low latency applications and have benchmarked it to take just a few cycles when no other thread is waiting to run.

                    Comment


                    • #50
                      Originally posted by schmidtbag View Post
                      One of the great things about Linux being open-source and Google having practically infinite money to dump onto problems, there's a good chance they might be able to fix up the scheduler so this is no longer an issue. And if they do, who knows, maybe this will yield some performance improvements in other applications. I figure robotics applications would benefit from a more optimized scheduler.
                      Except low-latency vs throughput cannot be fixed. Maybe on a quantum computer, but on traditional PCs, that's a no go.

                      Comment

                      Working...
                      X