Linux 5.5's Scheduler Sees A Load Balancing Rework For Better Perf But Risks Regressions
Ingo Molnar sent in the kernel's scheduler changes along with the other material he is overseeing for Linux 5.5. With this next version of the Linux kernel comes a rework to the Completely Fair Scheduler's load balancing logic. This is helping some workloads at least but with the intrusive change runs the risk of possible regressions.
The rework of the CFS load balancing logic was pursued by engineers at Linaro and Arm among other organizations due to finding poor task placement with the current algorithm. A large clean-up ensued and after several rounds of revisions they hope they have addressed all regressions. Ingo does acknowledge the risk of some fall-out from this invasive change, "The load-balancing rework is the most intrusive change: it replaces the old heuristics that have become less meaningful after the introduction of the PELT metrics, with a grounds-up load-balancing algorithm. As such it's not really an iterative series, but replaces the old load-balancing logic with the new one. We hope there are no performance regressions left - but statistically it's highly probable that there *is* going to be some workload that is hurting from these changes. If so then we'd prefer to have a look at that workload and fix its scheduling, instead of reverting the changes."
When testing on a dual quad-core ARM64 system they found the performance ranged from less than 1% to upwards of 10% for the Hackbench scheduler test. With a 224-core ARM64 server, the performance ranged from less than 1% improvements to 12% better performance with Hackbench and up to 33% better performance with Dbench. More numbers and details via the v4 patch revision.
We'll be running our own Linux 5.5 scheduler tests after the merge window closes next week.
So this scheduler code does make it one of the interesting low-level changes so far for this new Linux 5.5 merge window.
The rework of the CFS load balancing logic was pursued by engineers at Linaro and Arm among other organizations due to finding poor task placement with the current algorithm. A large clean-up ensued and after several rounds of revisions they hope they have addressed all regressions. Ingo does acknowledge the risk of some fall-out from this invasive change, "The load-balancing rework is the most intrusive change: it replaces the old heuristics that have become less meaningful after the introduction of the PELT metrics, with a grounds-up load-balancing algorithm. As such it's not really an iterative series, but replaces the old load-balancing logic with the new one. We hope there are no performance regressions left - but statistically it's highly probable that there *is* going to be some workload that is hurting from these changes. If so then we'd prefer to have a look at that workload and fix its scheduling, instead of reverting the changes."
When testing on a dual quad-core ARM64 system they found the performance ranged from less than 1% to upwards of 10% for the Hackbench scheduler test. With a 224-core ARM64 server, the performance ranged from less than 1% improvements to 12% better performance with Hackbench and up to 33% better performance with Dbench. More numbers and details via the v4 patch revision.
We'll be running our own Linux 5.5 scheduler tests after the merge window closes next week.
So this scheduler code does make it one of the interesting low-level changes so far for this new Linux 5.5 merge window.
7 Comments