SUSE announced kGraft back in February as a new way of live-patching the Linux kernel to reduce downtime. In late March they then released the kGraft source code and reiterated their intention to mainline kGraft in the Linux kernel.
Red Hat in early March meanwhile announced Kpatch as their new approach to live kernel patching that was under development prior to knowing about kGraft.
Jiri Slaby of SUSE posted yesterday the kGraft kernel patches under a "request for comments" flag to seek feedback from Linux kernel developers on the work. The kGraft work is spread across 16 patches.
This morning, Red Hat's Josh Poimboeuf countered with their two kernel patches needed to implement Kpatch. Those two patches can be found via the LKML mailing list. Kpatch is served as a self-contained GPL kernel module that doesn't need any core kernel code changes. Red Hat wants to see it merged, or some combination with kGraft.
Josh Poimboeuf's explanation from the Red Hat perspective in comparing against kGraft came down to:
I think the biggest difference between kpatch and kGraft is how they ensure that the patch is applied atomically and safely.We should see soon enough on the kernel mailing list whether kGraft or Kpatch will prevail, otherwise how well the two will work together.
kpatch checks the backtraces of all tasks in stop_machine() to ensure that no instances of the old function are running when the new function is applied. I think the biggest downside of this approach is that stop_machine() has to idle all other CPUs during the patching process, so it inserts a small amount of latency (a few ms on an idle system).
Instead, kGraft uses per-task consistency: each task either sees the old version or the new version of the function. This gives a consistent view with respect to functions, but _not_ data, because the old and new functions are allowed to run simultaneously and share data. This could be dangerous if a patch changes how a function uses a data structure. The new function could make a data change that the old function wasn't expecting.
With kpatch, that's not an issue because all the functions are patched at the same time. So kpatch is safer with respect to data interactions.
Other advantages of the kpatch stop_machine() approach:
- IMO, the kpatch code is much simpler than kGraft. The implementation is very straightforward and is completely self-contained. It requires zero changes to the kernel.
(However a new TAINT_KPATCH flag would be a good idea, and we do anticipate some minor changes to kprobes and ftrace for better compatibility.)
- The use of stop_machine() will enable an important not-yet-implemented feature to call a user-supplied callback function at loading time which can be used to atomically update data structures when applying a patch. I don't see how such a feature would be possible with the kGraft approach.
- kpatch applies patches immediately without having to send signals to sleeping processes, and without having to hope that those processes handle the signal appropriately.
- kpatch's patching behavior is more deterministic because stop_machine() ensures that all tasks are sleeping and interrupts are disabled when the patching occurs.
- kpatch already supports other cool features like:
- removing patches and rolling back to the original functions
- atomically replacing existing patches
- incremental patching
- loading multiple patch modules