Linux 5.5's Scheduler Sees A Load Balancing Rework For Better Perf But Risks Regressions

The rework of the CFS load balancing logic was pursued by engineers at Linaro and Arm among other organizations due to finding poor task placement with the current algorithm. A large clean-up ensued and after several rounds of revisions they hope they have addressed all regressions. Ingo does acknowledge the risk of some fall-out from this invasive change, "The load-balancing rework is the most intrusive change: it replaces the old heuristics that have become less meaningful after the introduction of the PELT metrics, with a grounds-up load-balancing algorithm. As such it's not really an iterative series, but replaces the old load-balancing logic with the new one. We hope there are no performance regressions left - but statistically it's highly probable that there *is* going to be some workload that is hurting from these changes. If so then we'd prefer to have a look at that workload and fix its scheduling, instead of reverting the changes."
When testing on a dual quad-core ARM64 system they found the performance ranged from less than 1% to upwards of 10% for the Hackbench scheduler test. With a 224-core ARM64 server, the performance ranged from less than 1% improvements to 12% better performance with Hackbench and up to 33% better performance with Dbench. More numbers and details via the v4 patch revision.
We'll be running our own Linux 5.5 scheduler tests after the merge window closes next week.
So this scheduler code does make it one of the interesting low-level changes so far for this new Linux 5.5 merge window.
7 Comments