Announcement

**NobodyXu** · 24 May 2023, 05:49 AM

Originally posted by kloczek View Post

It is hard to be buggy. procfs is not fs which could be multi instances depends on namespaces settiigs.

That's wrong, because procfs is actually multi-instances.

Container creates a pid namespace and a mount namespace.
Inside the pid namespace, mounting procfs would only give you process inside that namespace.
Inside the mount namespace, you can create mounting without affecting other namespaces.

Originally posted by kloczek View Post

I want to have possibility to limit what will be visible in devfs depends on namespace settings.
Please have look on namespaces setting in the kernel code is there anything which can limit such visibility depends on namespaces settings.

`/dev` is not a pseudo-filesystem, it is simply a tempdir.
/dev/full, /dev/null, /dev/zero, /dev/random and /dev/urandom is all created using mknod.

Inside the container, all the other devices has to be manually bind mounted into the container.

full(4) - Linux manual page

https://man7.org/linux/man-pages/man4/full.4.html

null(4) - Linux manual page

https://man7.org/linux/man-pages/man4/null.4.html

https://man7.org/linux/man-pages/man4/random.4.html, see section Configuration

Originally posted by kloczek View Post

Thank you for for the confirmation that such filtering does not exist and needs to be implemented.

It already exists, it's called seccomp as they told you.
It allows you filter whitelist or blacklist syscalls, and also filter syscalls based on the argument passed.

You could also trigger a ptrace within seccomp.

**kloczek** · 24 May 2023, 06:09 AM

Originally posted by NobodyXu View Post

That's wrong, because procfs is actually multi-instances.

Container creates a pid namespace and a mount namespace.
Inside the pid namespace, mounting procfs would only give you process inside that namespace.
Inside the mount namespace, you can create mounting without affecting other namespaces.

Please check content of the /proc/uptime.

`/dev` is not a pseudo-filesystem, it is simply a tempdir.
/dev/full, /dev/null, /dev/zero, /dev/random and /dev/urandom is all created using mknod.

And that is why dmesg executed in namespace shows /dev/kmsg from bare metal system.
On Solaris /dev content it is pseudo fs content.

Inside the container, all the other devices has to be manually bind mounted into the container.

full(4) - Linux manual page

https://man7.org/linux/man-pages/man4/full.4.html

null(4) - Linux manual page

https://man7.org/linux/man-pages/man4/null.4.html

https://man7.org/linux/man-pages/man4/random.4.html, see section Configuration

It already exists, it's called seccomp as they told you.
It allows you filter whitelist or blacklist syscalls, and also filter syscalls based on the argument passed.

You could also trigger a ptrace within seccomp.

Thank you for the confirmation that whole filtering is missing and needs to be implemented.

**NobodyXu** · 24 May 2023, 06:38 AM

Originally posted by kloczek View Post

Please check content of the /proc/uptime.

The point is it will only show process inside the pid namespace.
That is good enough, I don't care about uptime since it cannot be used to break out of the container.

Originally posted by kloczek View Post

And that is why dmesg executed in namespace shows /dev/kmsg from bare metal system.
On Solaris /dev content it is pseudo fs content.

Sorry but you missed my poin: the container does not ceate /dev/kmsg at all, so you have no access to it.

Originally posted by kloczek View Post

Thank you for the confirmation that whole filtering is missing and needs to be implemented.

What?
seccomp is the mechanism for syscall filtering, why do you think it is missing?

**Classical** · 24 May 2023, 07:29 AM

It would be desirable if Podman and HashiCorp Nomad get better interoperability plugins.

**kloczek** · 24 May 2023, 07:43 AM

Originally posted by NobodyXu View Post

The point is it will only show process inside the pid namespace.
That is good enough, I don't care about uptime since it cannot be used to break out of the container.

This issue is about leaking information from outside container.

What?
seccomp is the mechanism for syscall filtering, why do you think it is missing?

So where is implemented necessary here filtering using this mechanizm?

**NobodyXu** · 24 May 2023, 08:43 AM

Originally posted by kloczek View Post

This issue is about leaking information from outside container.

IMHO uptime is not a critical information that worth the time hiding,

Originally posted by kloczek View Post

So where is implemented necessary here filtering using this mechanizm?

Checkout libseccomp https://libseccomp.readthedocs.io/en/latest/

A library that wraps the underlying syscalls and eBPF to provide ready-to-use and easy-to-use API.

**kloczek** · 24 May 2023, 12:47 PM

Originally posted by NobodyXu View Post

IMHO uptime is not a critical information that worth the time hiding,

Whole procfs is only isolated in case of PID space. Everything else is COMPLETLY not touched so not only uptime leaking is possible and is possible to alter of the bare metal from container many things over /proc/sys.
The same story is with sysfs, debugfs and tracefs and few other.

Checkout libseccomp https://libseccomp.readthedocs.io/en/latest/

A library that wraps the underlying syscalls and eBPF to provide ready-to-use and easy-to-use API.

Thank you again for the confirmation that nothing here is implemented and none of the existing namespaces management applications can use OOTB what is provided on top of libseccomp.

**Quackdoc** · 24 May 2023, 01:19 PM

Originally posted by Jabberwocky View Post

IMO go was a good choice from a portability and readability perspective. A bonus was to use the same ecosystem as dockerd / cri-o.

What language(s) would you have preferred and why?

personally, I would be fine with most other languages, while I myself am partial towards rust, the main reasons for my distaste of go are both I find it incredibly painful and irritating to work with, and when paired with the shenanigans that the team behind go wanting to put in on by default telemetry in the toolchain (I realize this does not effect actually apps themselves), I have been seeking out and and replacing as many go applications as I can, that way when I do encounter something I want to contribute back to, it's in a language I would be fine working with.

**Nocifer** · 24 May 2023, 03:48 PM

Originally posted by Nuc!eoN View Post

IMHO they should first attempt to make podman-compose feature equivalent to docker-compose. So far podman tooling is a joke.

Before that they should first attempt to make podman-compose an officially supported and/or in-house developed project, right now it's basically a one man job by a third-party developer, and as a result of this there was a 1+ year gap between the current latest release and the previous one when that one developer went off the radar for a while, with podman-compose being broken for major use cases in the meanwhile. That's not what I'd call sane development of a core utility.

But then again Red Hat's container orchestration technology is K8s, which can utilize podman directly and has kind of an "instead of" relationship with podman-compose, so I suppose the latter would be an afterthought for them, if even that.

**darkonix** · 24 May 2023, 05:29 PM

Originally posted by Ironmask View Post

VMs should not be a stop-gap for the world's most insecure language.)

But C doesn't has a VM... ;-)

Announcement

Podman Desktop 1.0 Released As An Alternative To Docker Desktop

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment