Highly Threaded Linux Software Running Under CFS Quotas See Big Performance Fix
Thanks to a Linux kernel fix that is likely to be back-ported to the various stable series, highly threaded software running under CFS quotas for enforcing CPU limits are about to be much faster. At least in a synthetic test case, the kernel fix yields a 30x improvement in performance.
Spotted by the Kubernetes community but affecting others with highly threaded workloads and making use of a CFS quota to restricted shared CPU resources, it turns out that highly-threaded applications are routinely not getting "their fair share" of the CPU, leading to lower than expected performance and higher latency.
This has been a known bug for more than one year and a kernel bug report on unexpected CFS throttling since late 2017. The issue is believed to be recently fixed up for mainline Linux 5.4 and pending for back-ports after the patch was volleyed around the kernel mailing list for a few months.
There is the fix that is a few dozen lines of code that removes the expiration of CPU-local slices:
Thanks to Phoronix reader Mark for pointing out this recent kernel change.
Spotted by the Kubernetes community but affecting others with highly threaded workloads and making use of a CFS quota to restricted shared CPU resources, it turns out that highly-threaded applications are routinely not getting "their fair share" of the CPU, leading to lower than expected performance and higher latency.
This has been a known bug for more than one year and a kernel bug report on unexpected CFS throttling since late 2017. The issue is believed to be recently fixed up for mainline Linux 5.4 and pending for back-ports after the patch was volleyed around the kernel mailing list for a few months.
There is the fix that is a few dozen lines of code that removes the expiration of CPU-local slices:
It has been observed, that highly-threaded, non-cpu-bound applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This use case is typical of user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cpu cores.
...
This greatly improves performance of high-thread-count, non-cpu bound applications with low cfs_quota_us allocation on high-core-count machines. In the case of an artificial testcase (10ms/100ms of quota on 80 CPU machine), this commit resulted in almost 30x performance improvement, while still maintaining correct cpu quota restrictions.
Thanks to Phoronix reader Mark for pointing out this recent kernel change.
11 Comments