Linux 5.18 Scheduler Change To Further Boost AMD EPYC Performance For Some Workloads
A patch entitled "sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs" may not sound exciting and in fact I almost overlooked it through my usual patch monitoring of mailing lists and Git repositories. But this kernel scheduler change is actually rather significant in the case of AMD EPYC performance on Linux. It's the then-tentative code last year I wrote about (and then forgot about with the wide range of patches and coverage I carry out on Phoronix) but now revised and ready for the mainline Linux kernel.
Linux continues squeezing even more performance out of AMD EPYC/Zen platforms.
Longtime kernel developer Mel Gorman who authored the change explained, "[A kernel scheduler change from 2020] allowed an imbalance between NUMA nodes such that communicating tasks would not be pulled apart by the load balancer. This works fine when there is a 1:1 relationship between LLC and node but can be suboptimal for multiple LLCs if independent tasks prematurely use CPUs sharing cache. Zen* has multiple LLCs per node with local memory channels and due to the allowed imbalance, it's far harder to tune some workloads to run optimally than it is on hardware that has 1 LLC per node. This patch allows an imbalance to exist up to the point where LLCs should be balanced between nodes."
This Linux scheduler change for balancing between NUMA nodes is being improved for cases of the CPU having multiple LLCs per node.
What's exciting though is the end result and that is with an AMD Zen 3 platform he's been testing, the OpenMP-parallelized Stream memory benchmark was 173~272% faster depending upon the memory operation tested. It's a huge win for the upstream Stream memory benchmark but also other workloads depending upon behavior.
There can be huge improvements to performance and lower variation between runs depending upon the particular workload...
For the common Coremark CPU benchmark, the harmonic mean performance was up by 10% with this patch or the maximum result was 17% faster. For the SPECjbb Java benchmark, the performance was up by as much as 18%. The NPB EP benchmark saw a ~17% improvement in performance too and less deviation between runs. Even for workloads where the overall benchmark result didn't see much change, the deviation between runs was lower with this scheduler patch.
The sched/fair patch was pulled into sched/core, which means that barring any issues turning up in the next few weeks, this should be sent in for the Linux 5.18 merge window next month. Linux 5.18 is looking more and more exciting for its spring kernel release with a ton of great improvements. I'll, of course, have out my own benchmarks with this patch and other changes in due course.