Announcement

Collapse
No announcement yet.

The Linux Kernel's Scheduler Apparently Causing Issues For Google Stadia Game Developers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by MadCatX View Post

    The remark about stuttering video playback raised my eyebrows too. I thought this problem has been solved years ago with cgroups and I haven't seen it myself for a very long time. I can easily a watch videos and listen to music while compiling code on all CPU cores. There must be something else going on here.
    I think this Gnome developer meant it's Gnome which may also affect end result somehow. The blogger who wrote an article was pointed by few people his methodology is wrong, though.

    Comment


    • #62
      Originally posted by Volta View Post

      I think this Gnome developer meant it's Gnome which may also affect end result somehow. The blogger who wrote an article was pointed by few people his methodology is wrong, though.
      I understand the Gnome dev's remark. If the problem is indeed with Gnome behavior is situations with highly threaded workload, it's logical to assume that I wouldn't run into that because I run KDE. It's not just the methodology, it's his overall approach to the problem. For instance, from the followup discussion it's apparent that he benchmarked the different schedulers with different kernel versions and possibly with different configs. That itself makes the results next to worthless. In one case he seemed to have forgotten to rebuild his nVidia driver blob so Gnome probably fell back to software rendering through LLVMpipe. Hogging an already busy CPU with OpenGL rendering jobs every time a piece of screen needs to be repainted certainly can't be good.

      Comment


      • #63
        MadCatX Exactly. And it didn't stop him from saying Linux scheduler is bad just by basing on his flawed methodology. Furthermore, he didn't provide kernel configuration which is crucial in such cases. I can easily assume Google developers are really that bad. PS. Linux have very good tracing tools, but it seems he didn't bother to use them.

        Comment


        • #64
          Originally posted by Volta View Post
          Furthermore, he didn't provide kernel configuration which is crucial in such cases.
          Ubuntu 18.04 (4.15.0-72-generic) (in the comment-section) And there was suspected, that a kernel with CONFIG_PREEMPT_NONE was benched. So no, it's not the scheduler from Linux that's bad, but a kernel-config from a distribution that doesn't fit the needs.

          Comment


          • #65
            Originally posted by PuckPoltergeist View Post

            Ubuntu 18.04 (4.15.0-72-generic) (in the comment-section) And there was suspected, that a kernel with CONFIG_PREEMPT_NONE was benched. So no, it's not the scheduler from Linux that's bad, but a kernel-config from a distribution that doesn't fit the needs.
            This probably explains everything. I can't imagine how this is even possible for developer to be such lame?

            Comment


            • #66
              Originally posted by PuckPoltergeist View Post

              Ubuntu 18.04 (4.15.0-72-generic) (in the comment-section) And there was suspected, that a kernel with CONFIG_PREEMPT_NONE was benched. So no, it's not the scheduler from Linux that's bad, but a kernel-config from a distribution that doesn't fit the needs.
              Also, Ubuntu 18's systemd configuration was broken (in my opinion). I had wondered why my Ryzen 3900X would bog down while compiling big Android Studio projects on Ubuntu, but not on Fedora. It seems that changing the Ubuntu /etc/systemd/system.conf settings to enable accounting for most of the resources fixed it for me. You really do want the cgroups to share out the CPU by group, not by individual thread.

              Comment


              • #67
                Originally posted by PuckPoltergeist View Post

                What's "locks up" and "heavy thread activity"? I'm running a compile job with make -j20, listening to music and writing this answer at moment. Everything works smooth. So how is the scheduler crappy here?
                I see you desktop boys are still at it...

                One recent example that happened at work was a linux board that went AWOL due to high load. We couldn't even ssh into it reliably because it was "so busy". Compiling has a LOT of empty timeframes to fill in other stuff. It's very light weight load that doesn't utilize the whole system properly.

                On the desktop side I definitely got situations in which my mouse was lagging, sound got choppy and I couldn't continue watching youtube. One was with compiling but in addition to another runaway process. I think compiling alone is just not "good enough" to cause this.

                It MIGHT be related to IO as well but why should 3rd party IO cause issues with something that's not even using the disk/resource? Ask yourself that...

                Comment


                • #68
                  Originally posted by Almindor View Post

                  I see you desktop boys are still at it...

                  One recent example that happened at work was a linux board that went AWOL due to high load. We couldn't even ssh into it reliably because it was "so busy". Compiling has a LOT of empty timeframes to fill in other stuff. It's very light weight load that doesn't utilize the whole system properly.
                  If so, you won't get 100% CPU utilization.

                  On the desktop side I definitely got situations in which my mouse was lagging, sound got choppy and I couldn't continue watching youtube. One was with compiling but in addition to another runaway process. I think compiling alone is just not "good enough" to cause this.
                  Either this was IO-related (see below) or again a kernel-config not suited for responsiveness. If your kernel is configured with 1000Hz timer and preemtion enabled, this scenario (lagging input) is nearly impossible. Are you running a self build/configured kernel or a standard one from you distribution?

                  It MIGHT be related to IO as well but why should 3rd party IO cause issues with something that's not even using the disk/resource? Ask yourself that...
                  If your tasks aren't using any disk resources, the most reasonable explanation is swapping. And yes, this is really a problem of the Linux-kernel, already known and worked on. If your system is running out of free RAM, it will start swapping heavily. This causes heavy CPU and IO load, because the kernel is permanently paging out and back. I have more than one time DOSed my system with too much compile-tasks in parallel. The system gets unresponsive than.

                  So if you're running a desktop kernel (1000Hz timer, CONFIG_PREEMPT=y) you won't need RT-scheduling. I don't even have the cgroup-config from systemd active (https://www.phoronix.com/forums/foru...27#post1149827) and I can't trigger any lagging. It really looks like every bad scheduler behaviour is a wrong config.
                  Last edited by PuckPoltergeist; 03 January 2020, 12:35 PM. Reason: fixed the link

                  Comment


                  • #69
                    Originally posted by Zan Lynx View Post

                    Also, Ubuntu 18's systemd configuration was broken (in my opinion). I had wondered why my Ryzen 3900X would bog down while compiling big Android Studio projects on Ubuntu, but not on Fedora. It seems that changing the Ubuntu /etc/systemd/system.conf settings to enable accounting for most of the resources fixed it for me. You really do want the cgroups to share out the CPU by group, not by individual thread.
                    I can't confirm this. I have none of these options enabled in my sysd-config, the only way getting my system down is swapping (see above).

                    Comment


                    • #70
                      Originally posted by Volta View Post

                      I can't imagine how this is even possible for developer to be such lame?
                      Because he doesn't know as much as he believes. There are many comments on the blog post, correcting his view on correct locking. And he carries expectations from his experiences with Windows that simply doesn't fit on the distribution he's using. I think, he's a bad developer, cause he doesn't look at the tools. He doesn't really understand, what he's doing. Instead of reading and understanding, he's complaining.

                      edit:
                      Oh, and he has a very very big ego: https://isocpp.org/blog/2017/03/i-wr...malte-skarupke
                      Last edited by PuckPoltergeist; 03 January 2020, 02:21 PM.

                      Comment

                      Working...
                      X