Linux 5.4 Kernel To Bring Improved Load Balancing On AMD EPYC Servers

The scheduler topology improvement by SUSE's Matt Fleming changes the behavior as currently it turns out for EPYC hardware the kernel has failed to properly load balance across NUMA nodes on different sockets.
AMD EPYC/Zen processors now overrides the node reclaim distance to better account for the CPU's architecture. From one of the code comments, "AMD EPYC machines use this because even though the 2-hop distance is 32 (3.2x slower than a local memory access) performance actually *improves* if allowed to reclaim memory and load balance tasks between NUMA nodes 2-hops apart."
The change goes into more details and is part of the core scheduler changes queued ahead of the Linux 5.4 merge window opening up in two weeks.
5 Comments