Sched_ext Scheduler Idle Selection Being Extended For LLC & NUMA Awareness

Written by Michael Larabel in Linux Kernel on 28 October 2024 at 10:12 AM EDT. 16 Comments
LINUX KERNEL
While the sched_ext extensible scheduler code was merged for Linux 6.12, work on sched_ext itself it is not over. New patches this weekend continue working on NUMA awareness for it with its default idle selection policy while similar work on CPU last level cache (LLC) awareness are slated for the upcoming Linux 6.13 cycle.

Queued last week within sched_ext.git's "for-6.13" branch is a patch to introduce LLC awareness to the default idle selection policy. By leveraging the Linux kernel's scheduler topology information, LLC awareness is added to the idle selection policy.
"This allows schedulers using the built-in policy to make more informed decisions when selecting an idle CPU in systems with multiple LLCs, such as NUMA systems or chiplet-based architectures, and it helps keep tasks within the same LLC domain, thereby improving cache locality.

For efficiency, LLC awareness is applied only to tasks that can run on all the CPUs in the system for now. If a task's affinity is modified from user space, it's the responsibility of user space to choose the appropriate optimized scheduling domain."

That LLC awareness for sched_ext will in turn be introduced with Linux 6.13. Andrea Righi of NVIDIA introduced that support.

Multi socket multi LLC server


Andrea Righi has also been working on adding NUMA awareness to the default idle selection code too. That code is still undergoing code review but the latest work there was posted Sunday to the Linux kernel mailing list. That code extends the built-in idle CPU selection policy to prioritize CPUs within the same NUMA node. Righi explains in that patch:
"With this change applied, the built-in CPU idle selection policy follows this logic:

- always prioritize CPUs from fully idle SMT cores,
- select the same CPU if possible,
- select a CPU within the same LLC domain,
- select a CPU within the same NUMA node.

Both NUMA and LLC awareness features are enabled only when the system has multiple NUMA nodes or multiple LLC domains.

In the future, we may want to improve the NUMA node selection to account the node distance from prev_cpu. Currently, the logic only tries to keep tasks running on the same NUMA node. If all CPUs within a node are busy, the next NUMA node is chosen randomly."

We'll see if that NUMA awareness is ready in time for the upcoming Linux 6.13 merge window to join the LLC awareness support. In any event there continues to be a lot of interesting developments and adoption around sched_ext now that it's mainlined.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week