Announcement

**k1e0x** · 11 March 2021, 01:48 PM

Why do they want to do this with chroot? That seems weird. Just use a container, like mentioned before BSD showed us the right way to sandbox and virtualize root with jails 20 years ago.

Oh wait that is hard.. ok carry on MS.

(Ironically BSD did it to challenge Microsofts criticisms of Unix user permission controls. Kind of funny to see their solution and now MS's.)

**jacob** · 11 March 2021, 06:02 PM

Originally posted by kylew77 View Post

Indeed had to google that it does look like a good jail substitute! Uses a single Linux kernel and is not a full OS virtualization. Very cool.

LXC is a full-featured jails system. The implementation is different, as I understand it on FreeBSD a jail is a kernel object on its own while in LXC, a container (= jail, not to be confused with a docker container) is a user space application based on multiple kernel objects (a number of namespaces, private network stack instance, secomp etc.) Form a user's point of view that it does everything jails do and also some features that I don't know if they are available in jails, for ex. clustering, live migration, possibility to run GUI apps etc.

Note that it actually also supports full OS virtualisation via KVM even if that's not its primary mode of operation and intended use case.

**k1e0x** · 12 March 2021, 01:42 PM

Originally posted by jacob View Post

LXC is a full-featured jails system. The implementation is different, as I understand it on FreeBSD a jail is a kernel object on its own while in LXC, a container (= jail, not to be confused with a docker container) is a user space application based on multiple kernel objects (a number of namespaces, private network stack instance, secomp etc.) Form a user's point of view that it does everything jails do and also some features that I don't know if they are available in jails, for ex. clustering, live migration, possibility to run GUI apps etc.

Note that it actually also supports full OS virtualisation via KVM even if that's not its primary mode of operation and intended use case.

Linux containers and BSD containers (And Solaris Zones) are kind of different things in that they were designed for different reasons.

BSD Jails were designed for security, where as Linux containers were designed by 3rd parties for rapid development.

The implementation is different too.
BSD Jails (and zones) are an OS primitive, they were designed in the OS as a whole (not just the kernel) for the purpose of doing their job (running un-trusted software). So they have tight integration into other parts of the OS. (kernel, file system, process control, scheduling, network, system utilities etc) The Job originally was security but BSD Jails have come a long way with the container revolution and due to their age a lot of the small edge cases with them have already been worked out. Take a look at the Bastille Project for a modern container infrastructure on FreeBSD. https://bastillebsd.org

Linux containers are not OS primitives. They are many different projects, with different design goals all cobbled together into a "container" (namespace, cgroups, selinux and more) - but the idea of a container only exists as a downstream concept, there is no such thing in the kernel. This becomes more prevalent when you run into problems such as a user with docker being able to mount the hosts root filesystem into a guest to do whatever they want. There are air gaps in places where one project thinks it's another projects job to implement something.

Linux does this a lot, taking whatever already exists in the OS and bending it slightly to do what you want with minimal effort.. and that seems to be the route MS is taking.

**jacob** · 12 March 2021, 06:35 PM

Originally posted by k1e0x View Post

Linux containers and BSD containers (And Solaris Zones) are kind of different things in that they were designed for different reasons.

BSD Jails were designed for security, where as Linux containers were designed by 3rd parties for rapid development.

The implementation is different too.
BSD Jails (and zones) are an OS primitive, they were designed in the OS as a whole (not just the kernel) for the purpose of doing their job (running un-trusted software). So they have tight integration into other parts of the OS. (kernel, file system, process control, scheduling, network, system utilities etc) The Job originally was security but BSD Jails have come a long way with the container revolution and due to their age a lot of the small edge cases with them have already been worked out. Take a look at the Bastille Project for a modern container infrastructure on FreeBSD. https://bastillebsd.org

Linux containers are not OS primitives. They are many different projects, with different design goals all cobbled together into a "container" (namespace, cgroups, selinux and more) - but the idea of a container only exists as a downstream concept, there is no such thing in the kernel. This becomes more prevalent when you run into problems such as a user with docker being able to mount the hosts root filesystem into a guest to do whatever they want. There are air gaps in places where one project thinks it's another projects job to implement something.

Linux does this a lot, taking whatever already exists in the OS and bending it slightly to do what you want with minimal effort.. and that seems to be the route MS is taking.

Your are mistaking docker for LXC. In terms of features and use cases, LXC "containers" are jails, they have little to do with docker. The implementation is different from BSD in that in BSD there is indeed roughly speaking an OS primitive "create_jail()" whereas on Linux, the supervisor uses lower level primitives (it creates a cgroup, then a number of namespaces, applies a seccomp policy, remaps in-container UIDs to a non-privileged range and applies an apparmor profile to sandbox the whole environment). These things are not "cobbled" together, they are lower level concepts precisely intended to be used together. That's just the Linux way of doing things, the kernel likes to expose lower level primitives and push not just policy, but actual semantics into userland. It's the same with io_uring, udev, clone(), seccomp, DRM etc. Someone once said that Linux secretly wants to be a microkernel (albeit not in the traditional message-passing sense, and don't tell that to Linus

).

There were some "gaps" initially that were due to the LXD implementation, they have long been fixed. AFAIK there are no known security issues with LXC containers now. It's widely used by large users and hosting providers alike, if there were some endemic security problems with it the world would know about them. That's just FUD.

**macemoneta** · 17 March 2021, 03:46 PM

Originally posted by jacob View Post

Your are mistaking docker for LXC. In terms of features and use cases, LXC "containers" are jails, they have little to do with docker. The implementation is different from BSD in that in BSD there is indeed roughly speaking an OS primitive "create_jail()" whereas on Linux, the supervisor uses lower level primitives (it creates a cgroup, then a number of namespaces, applies a seccomp policy, remaps in-container UIDs to a non-privileged range and applies an apparmor profile to sandbox the whole environment). These things are not "cobbled" together, they are lower level concepts precisely intended to be used together. That's just the Linux way of doing things, the kernel likes to expose lower level primitives and push not just policy, but actual semantics into userland. It's the same with io_uring, udev, clone(), seccomp, DRM etc. Someone once said that Linux secretly wants to be a microkernel (albeit not in the traditional message-passing sense, and don't tell that to Linus

).

There were some "gaps" initially that were due to the LXD implementation, they have long been fixed. AFAIK there are no known security issues with LXC containers now. It's widely used by large users and hosting providers alike, if there were some endemic security problems with it the world would know about them. That's just FUD.

LXD is used in Chromebooks as well, for the Crostini VM, and soon ARCVM (for Android).

Announcement

Microsoft Security Researcher Proposes Unprivileged Chroot For Linux

Comment

Comment

Comment

Comment

Comment