`init.krun` does not reap zombie processes #189

teohhanhui · 2024-05-29T14:57:46Z

(Moved from https://github.com/slp/krun/issues/16)

Zombie processes inside the VM never get reaped, I guess because init.krun doesn't do this. Although we obviously don't need a full-blown init system, it should at least do this.

Is there some documentation of what PID 1 is expected to do?

Relevant writeups:

The text was updated successfully, but these errors were encountered:

slp · 2024-05-29T15:00:33Z

This is a regression I've also noticed last week. I have a fix here, will create a PR later.

Tell the kernel that we want to ignore SIGCHLD so it'll reap our children for us to avoid leaving zombie objects. Fixes containers#189 Signed-off-by: Sergio Lopez <slp@redhat.com>

Tell the kernel that we want to ignore SIGCHLD so it'll reap our children for us to avoid leaving zombie objects. Fixes #189 Signed-off-by: Sergio Lopez <slp@redhat.com>

ericcurtin · 2024-05-30T21:27:03Z

This issue came up in my feed. Never realized krun has it's own init system. FWIW systemd is as fast as written from scratch init systems if it's trimmed down like the one in this dracut module-setup.sh file:

https://gitlab.com/CentOS/automotive/rpms/dracut-automotive/-/blob/main/module-setup.sh

I've written an init system more than once, but when compared it to these mini systemd setups, it's typically just as fast and less maintenance to use systemd, with some custom binaries for specific tasks as systemd .service files.

Now the fat complicated systemd configuration from Fedora isn't so fast, but that's just because by default there's like a million features and .service units configured, when only around 5 units are actually needed.

ericcurtin · 2024-05-30T21:27:55Z

The above systemd is configured for initramfs, but just giving an example of how lean systemd can be.

I am glad I wrote an init system at least once for the learning experience though...

ericcurtin · 2024-05-30T21:51:23Z

I'm guessing its because we want a statically compiled init system for all OSes 🤔

asahilina · 2024-06-02T08:11:40Z

Since krun shares the filesystem with the host OS, I don't think systemd is a good idea... I don't think anyone tests multiple instances of systemd that both think they are PID 1 but share a filesystem, that sounds like a recipe for trouble...

ericcurtin · 2024-06-02T11:21:57Z

Since krun shares the filesystem with the host OS, I don't think systemd is a good idea... I don't think anyone tests multiple instances of systemd that both think they are PID 1 but share a filesystem, that sounds like a recipe for trouble...

This is one of the reasons podman was created, multiple instances of systemd, sharing filesystems, etc. this use case is regularly tested and deployed.

I'm not saying this is why we should or shouldn't use systemd but saying nobody tests this is just not true.

systemd can be as simple as a binary that acts as a process manager that forks other processes, almost every feature in systemd is optional (and at run-time, just change around the systemd unit files to do what is desired).

ericcurtin · 2024-06-02T11:28:08Z

systemd-nspawn does this kinda thing also but I'm more familiar with podman.

asahilina · 2024-06-02T14:02:40Z

I thought the whole point of containers was that they run with a different filesystem (root)? We run with the same filesystem root.

$ sudo systemd-nspawn -D /
Spawning container on root directory is not supported. Consider using --ephemeral, --volatile=yes or --volatile=state.

Evidently the systemd people don't think this is supposed to work.

ericcurtin · 2024-06-02T14:31:57Z

Yup and the podman equivalent is:

sudo podman run -ti --rootfs /:O /bin/bash

sudo podman run -ti --systemd=true --rootfs /:O /usr/lib/systemd/systemd

I do question the approach of sharing the whole root with both OSes, in the vast majority of VM/container solutions, it's pick and share what you need rather than share everything, even if the "share what you need" ends up being 80% of the hosts contents. I also think we'd re-implement less this way.

But if we want to try something unique, why not I guess :)

This is very loosely related to other conversations going on at the moment, again the angle is more towards ephemeral containers though:

https://gitlab.com/fedora/bootc/tracker/-/issues/4

ericcurtin · 2024-06-02T14:34:57Z

Like for example if we were desigining systemd to be run in a microVM, it's basically just ensuring you don't include certain directories and populating those with alternate configs, etc. this kind of thing.

ericcurtin · 2024-06-02T14:36:34Z

A bootc image for inside the microVM could be very well suited for this use-case eventually also, even on a non-booc OS.

ericcurtin · 2024-06-02T14:37:18Z

It also wouldn't be a bad idea to reduce the pressure on virtofs tbh.

ericcurtin · 2024-06-02T14:39:01Z

But don't get me wrong, I'd be happy to see this approach continue with virtiofs by default. It's novel and it could be interesting to see a project like this one try something completely different.

ericcurtin · 2024-06-02T14:58:20Z

This way of using PID1 for example, doesn't initialize selinux in the guest kernel, now for this use-case maybe some people don't care. But some people care deeply about having selinux on in all running kernel instances.

If we had a systemd binary in there, we could pick whether to initialize selinux or not.

asahilina · 2024-06-02T16:00:56Z

sudo podman run -ti --systemd=true --rootfs /:O /usr/lib/systemd/systemd

That doesn't actually run systemd, it just sets up the environment to run systemd.

It also doesn't actually use the host filesystem, instead it sets up an overlayfs. If you try to actually use the host FS:

$ sudo podman run -ti --rootfs / /bin/bash
Error: OCI runtime error: crun: pivot_root: Device or resource busy

So it doesn't work. So nobody can be testing this, by definition. It really is a very, very different usecase to all the container stuff people do.

ericcurtin · 2024-06-02T16:33:57Z

Yeah I understand, it was the wording of this was open to interpretation:

I don't think anyone tests multiple instances of systemd that both think they are PID 1 but share a filesystem

I read this and instantly got surprised as people do this all the time, but then you clarified with filesystem root, just a misunderstanding.

It wouldn't make sense to have exactly the same root filesystem anyway using two systemd's, the guest one would ideally have a modified /usr, /etc, etc. to trim it down so it's minimised.

I assumed we were gonna do things more like the ChromeOS Linux environment approach, but I guess not, interesting to see where this goes :)

ericcurtin · 2024-06-02T21:17:38Z

sudo podman run -ti --systemd=true --rootfs /:O /usr/lib/systemd/systemd

That doesn't actually run systemd, it just sets up the environment to run systemd.

This does run systemd, it sets up systemd and runs it, the /use/lib/systemd/systemd part exec's systemd.

It also doesn't actually use the host filesystem, instead it sets up an overlayfs. If you try to actually use the host FS:
$ sudo podman run -ti --rootfs / /bin/bash
Error: OCI runtime error: crun: pivot_root: Device or resource busy

This is interesting, this is supposed to work, pivot_rooting to yourself is weird though, this should probably be logged as a bug, if anyone cares about this feature.

So it doesn't work. So nobody can be testing this, by definition. It really is a very, very different usecase to all the container stuff people do.

teohhanhui mentioned this issue May 29, 2024

krun does not reap zombie processes AsahiLinux/muvm#16

Closed

slp mentioned this issue May 30, 2024

init: tell the kernel to reap our children for us #190

Merged

slp closed this as completed in #190 May 30, 2024

slp added a commit that referenced this issue May 30, 2024

init: tell the kernel to reap our children for us

4f317ec

Tell the kernel that we want to ignore SIGCHLD so it'll reap our children for us to avoid leaving zombie objects. Fixes #189 Signed-off-by: Sergio Lopez <slp@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`init.krun` does not reap zombie processes #189

`init.krun` does not reap zombie processes #189

teohhanhui commented May 29, 2024

slp commented May 29, 2024

ericcurtin commented May 30, 2024

ericcurtin commented May 30, 2024 •

edited

Loading

ericcurtin commented May 30, 2024

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 •

edited

Loading

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024 •

edited

Loading

init.krun does not reap zombie processes #189

init.krun does not reap zombie processes #189

Comments

teohhanhui commented May 29, 2024

slp commented May 29, 2024

ericcurtin commented May 30, 2024

ericcurtin commented May 30, 2024 • edited Loading

ericcurtin commented May 30, 2024

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 • edited Loading

ericcurtin commented Jun 2, 2024

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 • edited Loading

asahilina commented Jun 2, 2024

ericcurtin commented Jun 2, 2024 • edited Loading

ericcurtin commented Jun 2, 2024 • edited Loading

`init.krun` does not reap zombie processes #189

`init.krun` does not reap zombie processes #189

ericcurtin commented May 30, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024 •

edited

Loading

ericcurtin commented Jun 2, 2024 •

edited

Loading