Linux 6.3 MM Changes Bring New MEMFD & MGLRU Enhancements
Andrew Morton on Monday submitted his memory management "MM" updates for the Linux 6.3 merge window.
One of the notable memory management additions ofr Linux 6.3 is a new MEMFD patch series which allows setting the execute bit at MEMFD creation time and the option of sealing the state of the execute bit. Google engineers worked on this patch series as part of their ChromeOS work. The earlier patch series explained:
Since Linux introduced the memfd feature, memfd have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently.
However, in a secure by default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by Verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them, for such system, we need a solution to differentiate runc's use of executable memfds and an attacker's.
To address those above, this set of patches add following:
1: Let memfd_create() set X bit at creation time.
2: Let memfd to be sealed for modifying X bit.
3: A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit. For example, if a container has vm.memfd_noexec=2, then memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4: A new security hook in memfd_create(). This make it possible to a new LSM, which rejects or allows executable memfd based on its security policy.
Linux 6.3 is also bringing further refinements to MGLRU, the Multi-Gen LRU for improving the Linux kernel's page reclamation code. MGLRU has a performance regression fix for a workload pointed out in prior Phoronix testing as well as other performance optimization work. Again, thanks to Google.
There is also DAMOS filtering support for the DAMON framework to allow finer-grained control over DAMOS actions. DAMON is the Amazon-started Data Access Monitoring framework in the Linux kernel. DAMOS is short for the DAMON-based Operation Schemes.
The big list of MM patches for Linux 6.3 can be found via the pull request.