IBM Developers Looking At Adding System Call Isolation To Enhance Linux Security
The concept was announced overnight and there are some preliminary patches worked on by the IBM developers. Developer Mike Rapoport summed up the work elegantly:
The idea here is to allow an untrusted user access to a potentially vulnerable kernel in such a way that any kernel vulnerability they find to exploit is either prevented or the consequences confined to their isolated address space such that the compromise attempt has minimal impact on other tenants or the protected structures of the monolithic kernel. Although we hope to prevent many classes of attack, the first target we're looking at is ROP gadget protection.
These patches implement a "system call isolation (SCI)" mechanism that allows running system calls in an isolated address space with reduced page tables to prevent ROP attacks.
ROP attacks involve corrupting the stack return address to repoint it to a segment of code you know exists in the kernel that can be used to perform the action you need to exploit the system.
The idea behind the prevention is that if we fault in pages in the execution path, we can compare target address against the kernel symbol table. So if we're in a function, we allow local jumps (and simply falling of the end of a page) but if we're jumping to a new function it must be to an external label in the symbol table. Since ROP attacks are all about
jumping to gadget code which is effectively in the middle of real functions, the jumps they induce are to code that doesn't have an external symbol, so it should mostly detect when they happen.
This is very early POC, it's able to run the simple dummy system calls and a little bit beyond that, but it's not yet stable and robust enough to boot a system with system call isolation enabled for all system calls. Still, we wanted to get some feedback about the concept in general as early as possible.
While Linux System Call Isolation (SCI) will help with security, besides the code not yet being mature and well-rounded, they haven't closely looked at the performance impact yet. However, they acknowledge there would likely be a measurable hit to the system performance.
Those wanting to learn more about this proof-of-concept system call isolation feature can see this kernel patch series.