pidfd_getfd Lands In Linux 5.6 With Use-Cases From LXD To Web Browsers
In addition to the new openat2() system call in Linux 5.6, pidfd_getfd() has landed with growing interest from many different parties for what will be an increasingly used syscall moving forward.
The pidfd_getfd() system call provides a straight-forward and easy means of accessing file descriptors from other processes via pidfd. It's been possible to access file descriptors from other processes on existing Linux kernels but via messy ways causing unnecessary complications.
As for the possible use-cases of pidfd_getfd(), some of them were laid out in the already honored pull request, which hit Linux 5.6 Git overnight.
This new pidfd_getfd() system call was led by Netflix's Sargun Dhillon.
Linux 5.6 sure is stacking up to be a spectacular feature release and we're not even through the first week of the merge window.
The pidfd_getfd() system call provides a straight-forward and easy means of accessing file descriptors from other processes via pidfd. It's been possible to access file descriptors from other processes on existing Linux kernels but via messy ways causing unnecessary complications.
This syscall gets a copy of a file descriptor from another process based on the pidfd, and file descriptor number. It requires that the calling process has the ability to ptrace the process represented by the pidfd. The process which is having its file descriptor copied is otherwise unaffected.
As for the possible use-cases of pidfd_getfd(), some of them were laid out in the already honored pull request, which hit Linux 5.6 Git overnight.
There are currently two major users that wait on pidfd_getfd() and one future user:
- Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080.
- LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible.
- The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval.
This new pidfd_getfd() system call was led by Netflix's Sargun Dhillon.
Linux 5.6 sure is stacking up to be a spectacular feature release and we're not even through the first week of the merge window.
Add A Comment