"Nest" Is An Interesting New Take On Linux Kernel Scheduling For Better CPU Performance
There has been a number of different efforts in recent time to further enhance the Linux kernel's scheduler to better adapt to modern hardware architectures whether it be for Intel hybrid CPU designs, adapting to new CPU cache configurations, or just better scaling with today's ever-increasing core counts. Another scheduler effort detailed this week is "Nest" that aims to keep tasks on "warm cores" with hopes of lower latency due to being already at higher clock/performance states and ideally operating at an optimal turbo/boost frequency. The Nest developers find that their scheduler "improves performance 10%-2x and can reduce energy usage" with modern hardware.
Julia Lawall with Inria, France's National Institute for Research In Digital Science and Technology, presented on Nest at this weeks's Linux Plumbers Conference (LPC 2022) in Dublin. Nest was also worked on in cooperation with oracle Labs and University of Sydney.
While the existing Linux CFS scheduler behavior is to spread out tasks across the machine's available CPU cores, Nest takes a different approach given today's processor attributes. While spreading out the work can be beneficial and makes sense, firing up long-idled CPU cores can lead to latency until those cores are worked into a higher performance state (higher frequency) and can negatively impact the turbo frequency / power budget of currently-busy CPU cores. Nest takes this into account and tries to initially keep tasks to a set of "warm cores" that are already running in their highest performance state before spinning up the idled cores.
Nest also takes into account the parent/previous core in its scheduling decision to try to improve locality in the case of multi-socket CPUs.
I was thinking many of the benchmarks they opted to run were all familiar to what I was usually running...
Sure enough, they were using the open-source Phoronix Test Suite for benchmarking and evaluating the impact of their scheduling decisions across a wide variety of workloads.
Ultimately they found that the Nest task scheduler could yield a 10% to 2x performance improvement on light or moderate workloads across 1/2/4 socket Intel servers as well as AMD servers and desktops too. For demanding multi-threaded workloads already using the CPU(s) to their full capacity, obviously there isn't much difference there but it's for the light to moderate workloads where the warm core approach of Nest appears to be very helpful with modern higher-core count systems/servers.
Those wanting to learn more can find the Nest scheduler presentation from LPC 2022 embedded below along with the slide deck (PDF).
Julia Lawall with Inria, France's National Institute for Research In Digital Science and Technology, presented on Nest at this weeks's Linux Plumbers Conference (LPC 2022) in Dublin. Nest was also worked on in cooperation with oracle Labs and University of Sydney.
While the existing Linux CFS scheduler behavior is to spread out tasks across the machine's available CPU cores, Nest takes a different approach given today's processor attributes. While spreading out the work can be beneficial and makes sense, firing up long-idled CPU cores can lead to latency until those cores are worked into a higher performance state (higher frequency) and can negatively impact the turbo frequency / power budget of currently-busy CPU cores. Nest takes this into account and tries to initially keep tasks to a set of "warm cores" that are already running in their highest performance state before spinning up the idled cores.
Nest also takes into account the parent/previous core in its scheduling decision to try to improve locality in the case of multi-socket CPUs.
I was thinking many of the benchmarks they opted to run were all familiar to what I was usually running...
Sure enough, they were using the open-source Phoronix Test Suite for benchmarking and evaluating the impact of their scheduling decisions across a wide variety of workloads.
Ultimately they found that the Nest task scheduler could yield a 10% to 2x performance improvement on light or moderate workloads across 1/2/4 socket Intel servers as well as AMD servers and desktops too. For demanding multi-threaded workloads already using the CPU(s) to their full capacity, obviously there isn't much difference there but it's for the light to moderate workloads where the warm core approach of Nest appears to be very helpful with modern higher-core count systems/servers.
Those wanting to learn more can find the Nest scheduler presentation from LPC 2022 embedded below along with the slide deck (PDF).
43 Comments