Amazon Exploring MM-Local Memory Allocations To Help With Current/Future Speculation Attacks
Back in 2019 after various speculation-based CPU vulnerabilities began coming to light, Amazon engineers proposed process-local memory allocations for hiding KVM secrets. They were striving for an alternative mitigation for vulnerabilities like L1TF by essentially providing some memory regions for kernel allocations out of view/access from other kernel code. Amazon engineers this week laid out a new proposal after five years of ongoing Linux kernel improvements for MM-local memory allocations for dealing with current and future speculation-based cross-process attacks.
Roman Kagan of Amazon sent out a "request for comments" on Friday for introducing MM-local memory allocations within the Linux kernel. This new approach over their 2019 pursuit is making use of MEMFD secrets capabilities and other new kernel features for more easily implementing this extra memory security functionality.
Roman Kagan explained with the RFC patch series:
It will be interesting to see where this MM-local memory allocations work moves ahead to be more resilient toward cross-process speculation-based attacks.
Roman Kagan of Amazon sent out a "request for comments" on Friday for introducing MM-local memory allocations within the Linux kernel. This new approach over their 2019 pursuit is making use of MEMFD secrets capabilities and other new kernel features for more easily implementing this extra memory security functionality.
Roman Kagan explained with the RFC patch series:
"In a series posted a few years ago, a proposal was put forward to allow the kernel to allocate memory local to a mm and thus push it out of reach for current and future speculation-based cross-process attacks. We still believe this is a nice thing to have.
However, in the time passed since that post Linux mm has grown quite a few new goodies, so we'd like to explore possibilities to implement this functionality with less effort and churn leveraging the now available facilities.
Specifically, this is a proof-of-concept attempt to implement mm-local allocations piggy-backing on memfd_secret(), using regular user addressess but pinning the pages and flipping the user/supervisor flag on the respective PTEs to make them directly accessible from kernel, and sealing the VMA to prevent userland from taking over the address range. The approach allowed to delegate all the heavy lifting -- address management, interactions with the direct map, cleanup on mm teardown -- to the existing infrastructure, and required zero architecture-specific code.
Compared to the approach used in the orignal series, where a dedicated kernel address range and thus a dedicated PGD was used for mm-local allocations, the one proposed here may have certain drawbacks, in particular
- using user addresses for kernel memory may violate assumptions in various parts of kernel code which we may not have identified with smoke tests we did
- the allocated addresses are guessable by the userland (ATM they are even visible in /proc/PID/maps but that's fixable) which may weaken the security posture
Also included is a simple test driver and selftest to smoke test and showcase the feature.
The code is PoC RFC and lacks a lot of checks and special case handling, but demonstrates the idea. We'd appreciate any feedback on whether it's a viable approach or it should better be abandoned in favor of the one with dedicated PGD / kernel address range or yet something else."
It will be interesting to see where this MM-local memory allocations work moves ahead to be more resilient toward cross-process speculation-based attacks.
3 Comments