AMD Cooking Up A "PAN" Feature That Can Help Boost Linux Performance
AMD open-source engineers sent out a request for comments on a new kernel feature called "PAN", or Process Adaptive autoNUMA. Early numbers shown by AMD indicate that PAN can help with performance in some workloads on their latest server hardware by a measurable amount.
The proposed PAN is Process Adaptive autoNUMA and is an adaptive algorithm calculating the AutoNUMA scan period. AMD's Bharata B Rao further explained in the request for comments (RFC) Linux kernel patch series, "In this new approach (Process Adaptive autoNUMA or PAN), we gather NUMA fault stats at per-process level which allows for capturing the application behaviour better. In addition, the algorithm learns and adjusts the scan rate based on remote fault rate. By not sticking to a static threshold, the algorithm can respond better to different workload behaviours. Since the threads of a processes are already considered as a group, we add a bunch of metrics to the task's [memory management] to track the various types of faults and derive the scan rate from them. The new per-process fault stats contribute only to the per-process scan period calculation, while the existing per-thread stats continue to contribute towards the numa_group stats which eventually determine the thresholds for migrating memory and threads across nodes."
The important part for end-users / AMD EPYC customers is how PAN can benefit Linux performance. With a PAN'ed Linux kernel build, they found the Graph500 interconnect HPC benchmark to benefit by as much as 14.93% compared to a default Linux kernel build, NAS benchmarks were up to 8% faster, PageRank only about 0.37% faster, and other results from less than 1% to the more significant numbers noted. That's just with the limited selection of tests evaluated so far by AMD - it will certainly be fun to benchmark this patch series if it moves past the RFC stage and is something other kernel maintainers get behind and ultimately be upstreamed into the kernel.
So far no other kernel developers have commented on the Process Adaptive autoNUMA proposal but those interested can see the RFC series for learning more about this feature or testing it out. In its current form is less than 400 lines of new code to improve the Linux NUMA behavior.
The proposed PAN is Process Adaptive autoNUMA and is an adaptive algorithm calculating the AutoNUMA scan period. AMD's Bharata B Rao further explained in the request for comments (RFC) Linux kernel patch series, "In this new approach (Process Adaptive autoNUMA or PAN), we gather NUMA fault stats at per-process level which allows for capturing the application behaviour better. In addition, the algorithm learns and adjusts the scan rate based on remote fault rate. By not sticking to a static threshold, the algorithm can respond better to different workload behaviours. Since the threads of a processes are already considered as a group, we add a bunch of metrics to the task's [memory management] to track the various types of faults and derive the scan rate from them. The new per-process fault stats contribute only to the per-process scan period calculation, while the existing per-thread stats continue to contribute towards the numa_group stats which eventually determine the thresholds for migrating memory and threads across nodes."
The important part for end-users / AMD EPYC customers is how PAN can benefit Linux performance. With a PAN'ed Linux kernel build, they found the Graph500 interconnect HPC benchmark to benefit by as much as 14.93% compared to a default Linux kernel build, NAS benchmarks were up to 8% faster, PageRank only about 0.37% faster, and other results from less than 1% to the more significant numbers noted. That's just with the limited selection of tests evaluated so far by AMD - it will certainly be fun to benchmark this patch series if it moves past the RFC stage and is something other kernel maintainers get behind and ultimately be upstreamed into the kernel.
So far no other kernel developers have commented on the Process Adaptive autoNUMA proposal but those interested can see the RFC series for learning more about this feature or testing it out. In its current form is less than 400 lines of new code to improve the Linux NUMA behavior.
7 Comments