The current Linux kernel CPU hot-plugging support has been described as "an increasing nightmare full of races and undocumented behaviour", but fortunately it's in the process of being re-developed.
Thomas Gleixner has been one of the kernel developers looking to rework the Linux CPU hot-plug support and published a new patch-set today. Hot-plugging support in general has been a focus lately with work on a common system device hot-plug framework
for the kernel, true CPU hot-plug support
, ACPI hot-plug improvements
, and other efforts in recent months.
Gleixner's CPU hot-plug re-work that was published today consists of 40 patches that amount to over one thousand lines of changed code within the kernel. Below is his description of the massive CPU hot-plug changes for the Linux kernel.
The current CPU hotplug implementation has become an increasing nightmare full of races and undocumented behaviour. The main issue of the current hotplug scheme is the completely asymetric startup/teardown process. The hotplug notifiers are mostly undocumented and the CPU_* actions in lots of implementations seem to be randomly chosen.
We had a long discussion in San Diego last year about reworking the hotplug core into a fully symetric state machine. After a few doomed attempts to convert the existing code into a state machine, I finally found a workable solution.
The following patch series implements a trivial array based state machine, which replaces the existing steps in cpu_up/down and also the notifiers which must run on the hotplugged cpu are converted to a callback array. This documents clearly the ordering of the callbacks and also makes the asymetric behaviour very obvious.
This series converts the stop_machine thread to the smpboot infrastructure, implements the core state machine and converts all notifiers which have ordering constraints plus a randomly chosen bunch of other notifiers to the state machine.
The runtime installed callbacks are immediately executed by the core code on or on behalf of all cpus which have already reached the corresponding state. A non executing installer function is there as well to allow simple migration of the existing notifier maze.
The diffstat of the complete series is appended below.
36 files changed, 1300 insertions(+), 1179 deletions(-)
We add slightly more code at this stage (225 lines alone in a header file), but most of the conversions are removing code and we have only tackled about 30 of 130+ instances. Even with the current conversion state, the resulting text size shrinks already.
The current patch-set can be found in CPU hotplug rework - episode I