Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless podman via systemd with User directive does not work #6582

Closed
jdoss opened this issue Jun 11, 2020 · 5 comments
Closed

Rootless podman via systemd with User directive does not work #6582

jdoss opened this issue Jun 11, 2020 · 5 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jdoss
Copy link
Contributor

jdoss commented Jun 11, 2020

/kind bug

In anticipation of #6415 getting merged I pulled down @vrothberg branch to start playing around with the new --infra-conmon-pidfile flag and I found that trying to create systemd units with the User directive doesn't work. I want to have a systemd unit that creates and starts a pod and then more systemd units that create services in that pod that run as a rootless user.

I then built https://koji.fedoraproject.org/koji/buildinfo?buildID=1522974 for Fedora 32 and installed that to get the most recent build.

$ podman version
Version:      2.0.0-dev
API Version:  1
Go Version:   go1.14.3
Git Commit:   2c532a92cd9993d93f010c736dd61102d269eece
Built:        Wed Jun 10 19:00:00 2020
OS/Arch:      linux/amd64

Here is my systemd unit file for the pod:

[Unit]
Description=12345 pod service
Wants=network.target
After=network-online.target

[Service]
User=jdoss
Group=jdoss
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=-/usr/bin/podman pod create --infra-conmon-pidfile /tmp/12345-pod-conmon.pid --name 12345-pod -p 443:443 -p 80:80
ExecStart=/usr/bin/podman pod start 12345-pod
ExecStop=/usr/bin/podman pod stop -t 10 12345-pod
PIDFile=/tmp/12345-pod-conmon.pid
KillMode=none
Type=forking
SyslogIdentifier=12345-pod

[Install]
WantedBy=multi-user.target default.target

This fails to start the pod correctly:

$ systemctl status 12345-pod.service 
● 12345-pod.service - 12345 pod service
     Loaded: loaded (/etc/systemd/system/12345-pod.service; disabled; vendor preset: disabled)
     Active: failed (Result: protocol) since Thu 2020-06-11 10:11:36 CDT; 2min 57s ago
    Process: 272513 ExecStartPre=/usr/bin/podman pod create --infra-conmon-pidfile /tmp/12345-pod-conmon.pid --name 12345-pod -p 443:443 -p 80:80 (code=exited, status=0/SUCCESS)
    Process: 272540 ExecStart=/usr/bin/podman pod start 12345-pod (code=exited, status=0/SUCCESS)
        CPU: 299ms

Jun 11 10:11:35 sts7 systemd[1]: Starting 12345 pod service...
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: New main PID 272597 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: New main PID 272597 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: Failed with result 'protocol'.
Jun 11 10:11:36 sts7 systemd[1]: Failed to start 12345 pod service.
Jun 11 10:11:35 sts7 systemd[1]: Starting 12345 pod service...
Jun 11 10:11:35 sts7 systemd[6314]: Started podman-272525.scope.
Jun 11 10:11:35 sts7 systemd[6314]: Created slice cgroup user-libpod_pod_40db3b8c043fc1cd8d423e83d234c98d2c25d2b37f23211c68dcb54209436b1e.slice.
Jun 11 10:11:35 sts7 12345-pod[272525]: 40db3b8c043fc1cd8d423e83d234c98d2c25d2b37f23211c68dcb54209436b1e
Jun 11 10:11:35 sts7 systemd[6314]: podman-272525.scope: Succeeded.
Jun 11 10:11:35 sts7 systemd[6314]: Started podman-272552.scope.
Jun 11 10:11:35 sts7 systemd[6314]: Started libcrun container.
Jun 11 10:11:36 sts7 12345-pod[272552]: 40db3b8c043fc1cd8d423e83d234c98d2c25d2b37f23211c68dcb54209436b1e
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: New main PID 272597 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: New main PID 272597 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 10:11:36 sts7 systemd[1]: 12345-pod.service: Failed with result 'protocol'.
Jun 11 10:11:36 sts7 systemd[1]: Failed to start 12345 pod service.

The PID in question is conmon:

$ ps aux |grep 272597
jdoss     272597  0.0  0.0  80492  1996 ?        Ssl  10:11   0:00 /usr/bin/conmon --api-version 1 -c cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240 -u cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240 -r /usr/bin/crun -b /home/jdoss/.local/share/containers/storage/overlay-containers/cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240/userdata -p /run/user/1000/containers/overlay-containers/cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240/userdata/pidfile -n 40db3b8c043f-infra --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -s -l k8s-file:/home/jdoss/.local/share/containers/storage/overlay-containers/cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/cc06c7b2d28fc5219a8e31609f78b76e19275f48a8334fb1bf669de18afe5240/userdata/oci-log --conmon-pidfile /tmp/12345-pod-conmon.pid

After some digging it turns out this is by design for systemd https://access.redhat.com/solutions/4420581

Running this as a user system service does not have this issue:

[Unit]
Description=12345 pod service
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=-/usr/bin/podman pod create --infra-conmon-pidfile /tmp/12345-pod-conmon.pid --name 12345-pod -p 443:443 -p 80:80
ExecStart=/usr/bin/podman pod start 12345-pod
ExecStop=/usr/bin/podman pod stop -t 10 12345-pod
PIDFile=/tmp/12345-pod-conmon.pid
KillMode=none
Type=forking
SyslogIdentifier=12345-pod

[Install]
WantedBy=multi-user.target default.target
$ podman ps -a
CONTAINER ID  IMAGE                 COMMAND  CREATED         STATUS             PORTS               NAMES
7fa0a255ade1  k8s.gcr.io/pause:3.2           31 minutes ago  Up 31 minutes ago  0.0.0.0:80->80/tcp  ee25849f051e-infra
$ podman pod list
POD ID        NAME       STATUS   CREATED         # OF CONTAINERS  INFRA ID
ee25849f051e  12345-pod  Running  31 minutes ago  1                7fa0a255ade1

If I try to create a postgresql container in the same way without a pod, but with a PIDFile it just times out and says it doesn't start the unit. Here is the unit for this test case:

[Unit]
Description=12345 Postgresql Service
Wants=network.target
After=network-online.target

[Service]
User=jdoss
Group=jdoss
Environment=PODMAN_SYSTEMD_UNIT=%n
#Restart=on-failure
ExecStartPre=-/usr/bin/podman kill 12345-postgresql
ExecStartPre=-/usr/bin/podman rm 12345-postgresql
ExecStartPre=-/usr/bin/podman pull postgres:9.6.17
ExecStartPre=-/usr/bin/podman volume create 12345-postgresql
ExecStart=/usr/bin/podman run --rm --name 12345-postgresql -e POSTGRES_PASSWORD=supersecurepassword -e POSTGRES_USER=mycool_user -e POSTGRES_DB=mycooldb --volume 12345-postgresql:/var/lib/postgresql/data:Z --conmon-pidfile=/tmp/postgresql-conmon.pid postgres:9.6.17
ExecStop=/usr/bin/podman stop -t 10 12345-postgresql
PIDFile=/tmp/postgresql-conmon.pid
KillMode=none
Type=forking
SyslogIdentifier=12345-postgresql

[Install]
WantedBy=multi-user.target default.target

Which is weird because when the above unit starts it does write out the PIDFile:

$ ll /tmp/postgresql-conmon.pid 
-rw-r--r--. 1 jdoss jdoss 6 Jun 11 14:03 /tmp/postgresql-conmon.pid

and it shows the container as started:

$ podman ps -a
CONTAINER ID  IMAGE                              COMMAND   CREATED         STATUS             PORTS   NAMES
75f45e50668f  docker.io/library/postgres:9.6.17  postgres  44 seconds ago  Up 44 seconds ago          12345-postgresql

but it fails eventually with a time out from systemd.

Jun 11 13:43:57 sts7 systemd[1]: Failed to start 12345 Postgresql Service.
Jun 11 14:03:56 sts7 systemd[1]: Starting 12345 Postgresql Service...
Jun 11 14:05:29 sts7 systemd[1]: 12345-postgresql.service: start operation timed out. Terminating.
Jun 11 14:05:29 sts7 systemd[1]: 12345-postgresql.service: Failed with result 'timeout'.
Jun 11 14:05:29 sts7 systemd[1]: Failed to start 12345 Postgresql Service.

If you downgrade back down to podman 1.9.3 the 12345-postgresql.service still fails to start in the eyes of systemd but it is running when you do a podman ps.

With that all said, is there a way to run a root systemd unit with the User directive set that runs as a non root user that has a PIDFile?

My end use case is that I want to use Podman and Systemd with Fedora CoreOS to deploy services in a pod via Ignition that run via rootless Podman.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 11, 2020
@goochjj
Copy link
Contributor

goochjj commented Jun 11, 2020

Add the -d (detach) option do your ExecStart line and try again.

(otherwise you're not forking)

@jdoss
Copy link
Contributor Author

jdoss commented Jun 11, 2020

@goochjj Doh you are right. I set that on my 12345-postgresql.service and it results in the same error as the 12345-pod.service

● 12345-postgresql.service - 12345 Postgresql Service
     Loaded: loaded (/etc/systemd/system/12345-postgresql.service; disabled; vendor preset: disabled)
     Active: failed (Result: protocol) since Thu 2020-06-11 16:17:07 CDT; 12s ago
    Process: 343048 ExecStartPre=/usr/bin/podman kill 12345-postgresql (code=exited, status=0/SUCCESS)
    Process: 343077 ExecStartPre=/usr/bin/podman rm 12345-postgresql (code=exited, status=1/FAILURE)
    Process: 343118 ExecStartPre=/usr/bin/podman pull postgres:9.6.17 (code=exited, status=0/SUCCESS)
    Process: 343151 ExecStartPre=/usr/bin/podman volume create 12345-postgresql (code=exited, status=125)
    Process: 343178 ExecStart=/usr/bin/podman run -d --rm --name 12345-postgresql -e POSTGRES_PASSWORD=supersecurepassword -e POSTGRES_USER=mycool_user -e POSTGRES_DB=mycooldb --volume 12345-postgresql:/var/lib/postgresql/data:Z --conmon-pidfile=/tmp/postgresql-conmon.pid postgres:9.6.17 (code=exited, status=0/SUCCESS)
        CPU: 800ms

Jun 11 16:17:03 sts7 systemd[1]: Starting 12345 Postgresql Service...
Jun 11 16:17:07 sts7 systemd[1]: 12345-postgresql.service: New main PID 343211 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 16:17:07 sts7 systemd[1]: 12345-postgresql.service: New main PID 343211 does not belong to service, and PID file is not owned by root. Refusing.
Jun 11 16:17:07 sts7 systemd[1]: 12345-postgresql.service: Failed with result 'protocol'.
Jun 11 16:17:07 sts7 systemd[1]: Failed to start 12345 Postgresql Service.

@goochjj
Copy link
Contributor

goochjj commented Jun 11, 2020

That's...interesting... because I ran your service file (plus the -d) on my photon machine and it worked...
--cgroups no-conmon?

@jdoss
Copy link
Contributor Author

jdoss commented Jun 11, 2020

I am on Fedora 32 with cgroups v2 if that makes a difference.

$ rpm -qa |grep conmon
conmon-2.0.17-1.fc32.x86_64
$ cat /etc/redhat-release 
Fedora release 32 (Thirty Two)

Funny, it was not having the -d flag on my setup within Fedora CoreOS that was the source of my main problem. Thanks for this @goochjj It is now working as expected on FCOS now.

I started debugging on my main workstation as I was trying to better understand what was going on but It looks like I found a different issue with cgroups v2 as FCOS has cgroups v1 enabled still.

@vrothberg
Copy link
Member

Thanks for creating the issue! We currently do not support the User and Group fields in a systemd unit. However, the systemd team is investigating to improve on that.

Given we're discussing the issue in #5572 already, I am closing this issue and ask to comment over there.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants