New Linux Scheduler Patches Can Improve AMD Zen Performance For Some Workloads
A set of two patches under review on the kernel mailing list for tweaking some kernel scheduler behavior can provide noticeable performance benefits to those using AMD EPYC and Ryzen processors on various workloads.
Last year the Linux kernel scheduler code was adapted to allow a floating imbalance between NUMA nodes until 25% of the CPU cores are occupied while higher than that the balancing behaves as normal. Prior to that an imbalance between NUMA nodes was only allowed when the destination node was effectively idle.
Longtime Linux kernel developer Mel Gorman who wrote that floating imbalance change between NUMA nodes for the kernel last year has revisited it. Where there isn't a 1:1 relationship between the last-level cache (LLC) and node, such as the case for AMD Zen processors, the imbalancing can be sub-optimal for multiple LLCs.
Long story short, with this revised NUMA imbalance code that takes into account multiple last-level caches, it can provide for a performance win. With benchmarks carried out by Gorman, on an AMD Zen 3 system when running the Stream OpenMP-based memory benchmark he saw improvements between 180% and 268%. For the Coremark CPU synthetic benchmark he saw the harmonic mean and maximum performance go up by 15% while the minimum score improved by nearly 10% too. With SPECjbb Java workloads he also generally saw better performance too.
The patches for those interested can be found via the kernel mailing list. Hopefully this work will continue to prove to be a win and manage to get aligned for landing in Linux 5.17.
Last year the Linux kernel scheduler code was adapted to allow a floating imbalance between NUMA nodes until 25% of the CPU cores are occupied while higher than that the balancing behaves as normal. Prior to that an imbalance between NUMA nodes was only allowed when the destination node was effectively idle.
Longtime Linux kernel developer Mel Gorman who wrote that floating imbalance change between NUMA nodes for the kernel last year has revisited it. Where there isn't a 1:1 relationship between the last-level cache (LLC) and node, such as the case for AMD Zen processors, the imbalancing can be sub-optimal for multiple LLCs.
Long story short, with this revised NUMA imbalance code that takes into account multiple last-level caches, it can provide for a performance win. With benchmarks carried out by Gorman, on an AMD Zen 3 system when running the Stream OpenMP-based memory benchmark he saw improvements between 180% and 268%. For the Coremark CPU synthetic benchmark he saw the harmonic mean and maximum performance go up by 15% while the minimum score improved by nearly 10% too. With SPECjbb Java workloads he also generally saw better performance too.
The patches for those interested can be found via the kernel mailing list. Hopefully this work will continue to prove to be a win and manage to get aligned for landing in Linux 5.17.
29 Comments