IBM Proposing A CPU Namespace For The Linux Kernel
IBM engineer Pratik Sampat published an early prototype of a CPU namespace interface for the Linux kernel. This CPU namespace was devised to address coherency issues with current means of viewing available CPU resources as well as addressing possible security issues stemming from understanding resource access/positioning on the system.
One of the drivers in coming up with this Linux CPU namespace proposal is the mixed ways of viewing and managing CPU resources at the moment, "The control and the display interface is fairly disjoint with each other. Restrictions can be set through control interfaces like cgroups, while many applications legacy or otherwise get the view of the system through sysfs/procfs and allocate resources like number of threads/processes, memory allocation based on that information. This can lead to unexpected running behaviors as well as have a high impact on performance."
Meanwhile alternative methods are described as less than ideal in the RFC letter, "Existing solutions to the problem include userspace tools like LXCFS which can fake the sysfs information by mounting onto the sysfs online file to be in coherence with the limits set through cgroup cpuset. However, LXCFS is an external solution and needs to be explicitly setup for applications that require it. Another concern is also that tools like LXCFS don't handle all the other display mechanism like procfs load stats."
The security implications meanwhile described include "a case where an actor can be in cognizance of the CPU node topology can schedule workloads and select CPUs such that the bus is flooded causing a Denial Of Service attack" or "a case wherein identifying the CPU system topology can help identify cores that are close to buses and peripherals such as GPUs to get an undue latency advantage from the rest of the workloads."
The IBM-led CPU Namespace proposal thus pursues the following design:
In an experiment of the CPU namespace while testing with the Nginx web server, "With the CPU namespace we see the correct number of PIDs spawning corresponding to the cpuset limits set. The memory utilization drops over 92-95%, the latency reduces by 64% and the the throughput like requests and transfer per second is unchanged."
There still are a number of known shortcomings to the current design but the initial performance numbers are exciting. More details on this "RFC" patch series for the Linux CPU namespace interface can be found via this mailing list thread. There is also more details on the effort via this web page.
One of the drivers in coming up with this Linux CPU namespace proposal is the mixed ways of viewing and managing CPU resources at the moment, "The control and the display interface is fairly disjoint with each other. Restrictions can be set through control interfaces like cgroups, while many applications legacy or otherwise get the view of the system through sysfs/procfs and allocate resources like number of threads/processes, memory allocation based on that information. This can lead to unexpected running behaviors as well as have a high impact on performance."
Meanwhile alternative methods are described as less than ideal in the RFC letter, "Existing solutions to the problem include userspace tools like LXCFS which can fake the sysfs information by mounting onto the sysfs online file to be in coherence with the limits set through cgroup cpuset. However, LXCFS is an external solution and needs to be explicitly setup for applications that require it. Another concern is also that tools like LXCFS don't handle all the other display mechanism like procfs load stats."
The security implications meanwhile described include "a case where an actor can be in cognizance of the CPU node topology can schedule workloads and select CPUs such that the bus is flooded causing a Denial Of Service attack" or "a case wherein identifying the CPU system topology can help identify cores that are close to buses and peripherals such as GPUs to get an undue latency advantage from the rest of the workloads."
The IBM-led CPU Namespace proposal thus pursues the following design:
This prototype patchset introduces a new kernel namespace mechanism -- CPU namespace.
The CPU namespace isolates CPU information by virtualizing logical CPU IDs and creating a scrambled virtual CPU map of the same. It latches onto the task_struct and is the cpu translations designed to be in a flat hierarchy this means that every virtual namespace CPU maps to a physical CPU at the creation of the namespace. The advantage of a flat hierarchy is that translations are O(1) and children do not need to traverse up the tree to retrieve a translation.
This namespace then allows both control and display interfaces to be CPU namespace context aware, such that a task within a namespace only gets the view and therefore control of its and view CPU resources available to it via a virtual CPU map.
In an experiment of the CPU namespace while testing with the Nginx web server, "With the CPU namespace we see the correct number of PIDs spawning corresponding to the cpuset limits set. The memory utilization drops over 92-95%, the latency reduces by 64% and the the throughput like requests and transfer per second is unchanged."
There still are a number of known shortcomings to the current design but the initial performance numbers are exciting. More details on this "RFC" patch series for the Linux CPU namespace interface can be found via this mailing list thread. There is also more details on the effort via this web page.
10 Comments