Improved Memory Bandwidth Throttling Behavior For Linux 6.9
The x86 cache updates for Linux 6.9 offer an improved memory bandwidth throttling heuristic such as used by Intel Resource Director Technology (RDT) and also AMD EPYC CPUs with the resctrl code.
The improved memory bandwidth throttling heuristic is intended to better handle workloads with not too regular load levels where on existing kernel versions they end up being unnecessarily penalized.
Longtime Intel Linux engineer Tony Luck spearheaded this improvement and explained in the patch:
This fixes the memory bandwidth throttling heuristic that had been in place since 2018.
This was merged last week as part of the x86/cache pull request for the Linux 6.9 kernel.
The improved memory bandwidth throttling heuristic is intended to better handle workloads with not too regular load levels where on existing kernel versions they end up being unnecessarily penalized.
Longtime Intel Linux engineer Tony Luck spearheaded this improvement and explained in the patch:
The MBA_mbps feedback loop increases throttling when a group is using more bandwidth than the target set by the user in the schemata file, and decreases throttling when below target.
To avoid possibly stepping throttling up and down on every poll a flag "delta_comp" is set whenever throttling is changed to indicate that the actual change in bandwidth should be recorded on the next poll in "delta_bw". Throttling is only reduced if the current bandwidth plus delta_bw is below the user target.
This algorithm works well if the workload has steady bandwidth needs. But it can go badly wrong if the workload moves to a different phase just as the throttling level changed. E.g. if the workload becomes essentially idle right as throttling level is increased, the value calculated for delta_bw will be more or less the old bandwidth level. If the workload then resumes, Linux may never reduce throttling because current bandwidth plu delta_bw is above the target set by the user.
Implement a simpler heuristic by assuming that in the worst case the currently measured bandwidth is being controlled by the current level of throttling. Compute how much it may increase if throttling is relaxed to the next higher level. If that is still below the user target, then it is ok to reduce the amount of throttling.
This fixes the memory bandwidth throttling heuristic that had been in place since 2018.
This was merged last week as part of the x86/cache pull request for the Linux 6.9 kernel.
2 Comments