Announcement

Collapse
No announcement yet.

Red Hat Has Been Working On "stalld" As A Thread Stall Detector + Booster

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Red Hat Has Been Working On "stalld" As A Thread Stall Detector + Booster

    Phoronix: Red Hat Has Been Working On "stalld" As A Thread Stall Detector + Booster

    Red Hat engineers in recent weeks began working on a new project called "starved" though recently renamed to "stalld". The stalld service is for serving as a Linux thread stall detector...

    http://www.phoronix.com/scan.php?pag...Red-Hat-stalld

  • #2
    Wouldn't that just increase the thrashing?

    Comment


    • #3
      Originally posted by carewolf View Post
      Wouldn't that just increase the thrashing?
      No, that's in windows, silly.

      Comment


      • #4
        Originally posted by eydee View Post

        No, that's in windows, silly.
        Happens all the time on Linux. It is why the a thread would be stalled in the first place. In fact Linux is much worse at recovering from it.

        Comment


        • #5
          At least it sounds like a very nice service to generate alerts on systems that need optimising / investigating.

          Comment


          • #6
            Originally posted by carewolf View Post
            Wouldn't that just increase the thrashing?
            I'd guess it depends on if the thread is stalling because of stupid scheduling or actually just too much work?

            Comment


            • #7
              I guess I don't see the point here. If the thread is normal priority then it is competing against other threads of similar priority. The Linux CPU scheduler seems to handle this pretty well. If it is a low priority thread then I want it to stall. [email protected] should always yield time to my compile job, which should always yield time to my GUI.

              Taking a thread and boosting its priority is cheating the system. If that thread was important then why wasn't it already higher priority? For example, pulseaudio threads use rtkit to get real-time priorities necessary for clean audio handling.

              Comment


              • #8
                Ahh, just read the code repository. This stalld is for KERNEL threads only, nothing to do with userspace. It seems to me the same question though: why not just set these kernel threads with this DEADLINE setting as the default? That way they'd always get a minimum amount of runtime. If that's what they need then give it to them.

                Comment


                • #9
                  I think the repo information is out of date. Stalld just looks at thread state for threads that have been on a run-queue for longer than a specified threshold. If one is found it's given a temporary boost by changing it to a SCHED_DEADLINE policy, letting it run for the specified boost period (usually 10-20us) and then returning it to it's original scheduling policy. The current version doesn't distinguish between kernel and user threads.

                  This starvation issue is mainly caused by people running polling-mode applications using SCHED_FIFO priorities, which hog the cpu and prevent other threads from runnning on the cpu. That's fine except when the kernel needs something to run specifically on that cpu, like an RCU thread or a kworker. Stalld prevents the eventual flood of stall tracebacks in the system log and the slow grinding-to-a-halt of the system.

                  Comment


                  • #10
                    This daemon was created to avoid the starvation of threads caused by a cpu intensive thread using the FIFO/RR scheduler. These cpu-intensive threads are set with FIFO/RR tasks to prevent OS noise from kernel threads. However, some kernel threads need to run every once and then, so this setup creates a problem of starvation.

                    The first solution is actually to boost all threads to a higher priority. However, if all kernel threads were set to SCHED_DEADLINE or SCHED_FIFO/RR with a priority higher than the "cpu-intensive" task, they could all arrive at once, causing a prolonged OS noise. Think about a system with a threaded softirq, a kworker, and a rcu thread running all at once, they would create a longer noise in the "cpu-intensive" thread. (Also, kworkers are set as CFS by default, and dynamically created, so it is tricky to set them with a higher priority "by hand").

                    stalld tries to mitigate these two problems: it boosts one starving thread per time, giving it a chance, while avoiding the chaining of tasks, bounding the noise using the SCHED_DEADLINE reservation.

                    Comment

                    Working...
                    X