Intel Fixing Up Sub-NUMA Clustering For Linux So That It Behaves With RDT
Sub-NUMA Clustering with Intel Xeon processors allows for splitting up the CPU cores, cache, and memory into multiple NUMA domains for enhancing the performance of NUMA-aware applications. While SNC can help in a number of cases especially plenty of HPC and server workloads, currently it's not properly supported if making use of Resource Director Technology (RDT) on modern Intel CPUs. That is in the process of changing with new Linux kernel patches being worked on by Intel.
Intel Resource Director Technology allows for insight (monitoring) and control over shared resources such as memory bandwidth and last-level cache use by applications and VMs. But these extra insights and controls around cache and memory allocation clash with Sub-NUMA Clustering. So the Linux kernel patches being worked on by Intel engineers are working to allow RDT and SNC to better co-exist, although RDT may still not provide the most accurate information on an SNC-enabled server. This has been a known problem being worked on for months by Linux engineers.
The latest patches by Intel engineer Tony Luck explain:
These patches are still being reviewed but perhaps they'll be ready for mainline by the v6.7 kernel cycle later this year.
Intel Resource Director Technology allows for insight (monitoring) and control over shared resources such as memory bandwidth and last-level cache use by applications and VMs. But these extra insights and controls around cache and memory allocation clash with Sub-NUMA Clustering. So the Linux kernel patches being worked on by Intel engineers are working to allow RDT and SNC to better co-exist, although RDT may still not provide the most accurate information on an SNC-enabled server. This has been a known problem being worked on for months by Linux engineers.
The latest patches by Intel engineer Tony Luck explain:
"The Sub-NUMA cluster feature on some Intel processors partitions the CPUs that share an L3 cache into two or more sets. This plays havoc with the Resource Director Technology (RDT) monitoring features. Prior to this patch Intel has advised that SNC and RDT are incompatible.
Some of these CPU support an MSR that can partition the RMID counters in the same way. This allows for monitoring features to be used (with the caveat that memory accesses between different SNC NUMA nodes may still not be counted accuratlely.
Note that this patch series improves resctrl reporting considerably on systems with SNC enabled, but there will still be some anomalies for processes accessing memory from other sub-NUMA nodes."
These patches are still being reviewed but perhaps they'll be ready for mainline by the v6.7 kernel cycle later this year.
5 Comments