Linux Performance Patches Revved To Avoid Too Many Unnecessary Cross-CPU Wake-ups

Written by Michael Larabel in Linux Kernel on 22 February 2023 at 03:45 PM EST. 1 Comment
LINUX KERNEL
A patch series started by Intel to improve the Linux kernel's fair scheduler code, which has also seen testing/feedback from AMD engineers and other stakeholders, continues to be improved upon. The focus of this patch series is on avoiding too many cross-CPU wake-ups when they are unnecessary. In doing so, these patches help enhance the Linux performance particularly on high core count systems.

The Linux kernel scheduler enhancement is around waking short tasks on the current CPU to avoid cross-CPU wake-ups. Intel engineer Chen Yu explains in the patch cover letter:
Inhibits the cross CPU wake-up by placing the wakee on waking CPU, if both the waker and wakee are short-duration tasks. The short duration task could become a trouble maker on high-load system, because it could bring frequent context switch. This strategy only takes effect when the system is busy. Because it is unreasonable to inhibit the idle CPU scan when there are still idle CPUs.

First, introduce the definition of a short-duration task. Then leverages the first patch to choose a local CPU for wakee.

Both AMD and Intel platforms are benefiting from the in-development patches. The benefits vary based on workload and how busy the system is in the first place. These patches particularly benefit AMD EPYC and Intel Xeon server processors as well as other HEDT systems with high core counts.
Overall there is performance improvement on some overloaded case. Such as will-it-scale, netperf. And no noticeable impact on schbench, hackbench, tbench and a OLTP workload with a commercial RDBMS, tested on a Intel Xeon 2 x 56C machine.

Per the test on Zen3 from Prateek, most benchmarks result saw small wins or are comparable to sched:tip. SpecJBB Critical-jOps improved while Max-jOPS saw a small hit, but it might be in the expected range. ycsb-mongodb saw small uplift in NPS1 mode.

Throughput improvement of netperf(localhost) was observed on a Rome 2 x 64C machine, when the number of clients equals the CPUs.

Overall the benefits appear to be small but measurable. Given all the relentless kernel optimizations pursued by many different parties and the performance tuning patches I feature on Phoronix near-daily, every little bit helps and is very much welcome. I'll be testing out these latest patches when they are on a trajectory for mainline.

Intel and AMD server CPUs


The patches were revised today for the sixth time with updated behavior around checking the wake/wakee CPU selection to avoid a possible Redis performance regression.

It's too late to see this scheduler enhancement land for the v6.3 kernel but we'll see where this work leads over the coming weeks/months and hopefully ends up being a beneficial improvement for today's high core count servers in a future Linux kernel release.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week