Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube sdnotify: run proxies for the lifespan of the service #16709

Merged
merged 2 commits into from
Dec 7, 2022

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Dec 2, 2022

As outlined in #16076, a subsequent BARRIER may follow the READY
message sent by a container. To correctly imitate the behavior of
systemd's NOTIFY_SOCKET, the notify proxies span up by kube play must
hence process messages for the entirety of the workload.

We know that the workload is done and that all containers and pods have
exited when the service container exits. Hence, all proxies are closed
at that time.

The above changes imply that Podman runs for the entirety of the
workload and will henceforth act as the MAINPID when running inside of
systemd. Prior to this change, the service container acted as the
MAINPID which is now not possible anymore; Podman would be killed
immediately on exit of the service container and could not clean up.

The kube template now correctly transitions to in-active instead of
failed in systemd.

Fixes: #16076
Fixes: #16515
Signed-off-by: Valentin Rothberg vrothberg@redhat.com

Does this PR introduce a user-facing change?

Fix a bug where barrier sd-notify messages were ignored when using notify policies in kube-play.

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 2, 2022
@vrothberg
Copy link
Member Author

@alexlarsson PTAL

@alexlarsson
Copy link
Contributor

Generally lgtm though

@vrothberg vrothberg marked this pull request as ready for review December 2, 2022 13:08
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2022
@vrothberg
Copy link
Member Author

@containers/podman-maintainers PTAL

@vrothberg
Copy link
Member Author

/hold

This needs more work. The barrier is send in a following message:

podman (fix-16515) $ socat -v unix-recvfrom:/tmp/test.sock,fork -
> 2022/12/02 15:06:50.309288  length=7 from=0 to=6
READY=1READY=1> 2022/12/02 15:06:50.309576  length=9 from=0 to=8
BARRIER=1BARRIER=1> 2022/12/02 15:06:55.314936  length=9 from=0 to=8

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2022
The flake in containers#16076 is likely related to the notify message not being
delivered/read correctly.  Move sending the message into an exec session
such that flakes will reveal an error message.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg vrothberg force-pushed the fix-16515 branch 2 times, most recently from 992f6b4 to 236b58f Compare December 6, 2022 10:41
@vrothberg vrothberg changed the title notifyproxy: handle barrier messages kube sdnotify: run proxies for the lifespan of the service Dec 6, 2022
@vrothberg
Copy link
Member Author

@alexlarsson @umohnani8 @edsantiago PTAL

@umohnani8 using --service-container will now implicitly wait for the container to exit. You could use that in the --wait PR and have a

select {
 case err := <- errorChannelFromKubePlay:
    return err
 case <- signalHandlerChannel:
   return teardown(...)
}

@vrothberg
Copy link
Member Author

@rhatdan PTAL

@vrothberg
Copy link
Member Author

vrothberg commented Dec 6, 2022

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way over my head, just a few surgical comments. Thanks for addressing this.

test/system/260-sdnotify.bats Show resolved Hide resolved
pkg/systemd/notifyproxy/notifyproxy.go Outdated Show resolved Hide resolved
pkg/systemd/notifyproxy/notifyproxy.go Show resolved Hide resolved
test/system/260-sdnotify.bats Show resolved Hide resolved
test/system/260-sdnotify.bats Outdated Show resolved Hide resolved
As outlined in containers#16076, a subsequent BARRIER *may* follow the READY
message sent by a container.  To correctly imitate the behavior of
systemd's NOTIFY_SOCKET, the notify proxies span up by `kube play` must
hence process messages for the entirety of the workload.

We know that the workload is done and that all containers and pods have
exited when the service container exits.  Hence, all proxies are closed
at that time.

The above changes imply that Podman runs for the entirety of the
workload and will henceforth act as the MAINPID when running inside of
systemd.  Prior to this change, the service container acted as the
MAINPID which is now not possible anymore; Podman would be killed
immediately on exit of the service container and could not clean up.

The kube template now correctly transitions to in-active instead of
failed in systemd.

Fixes: containers#16076
Fixes: containers#16515
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM

Comment on lines +357 to +358
main_pid=$(awk -F= '{print $2}' <<< ${lines[0]})
is "$(</proc/$main_pid/comm)" "podman" "podman is the service mainPID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. thank you.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [edsantiago,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

Done. I don't want to submit it as a PR, because this is a bug week, but I'll submit it over the weekend. (Unless you need the magic BARRIER functionality right now; if you do, LMK and we'll coordinate).

@vrothberg
Copy link
Member Author

Note: I would love to create a new container image with systemd-notify installed for CI. The one we currently use is based on Fedora 31 which is pretty old and does NOT send a BARRIER message.

Done. I don't want to submit it as a PR, because this is a bug week, but I'll submit it over the weekend. (Unless you need the magic BARRIER functionality right now; if you do, LMK and we'll coordinate).

Thank you, Ed! It's not that urgent. I can prepare a follow-up PR next week.

@vrothberg
Copy link
Member Author

@containers/podman-maintainers PTAL

Comment on lines +22 to +26
_notifyRcvbufSize = 8 * 1024 * 1024
_notifyBufferMax = 4096
_notifyFdMax = 768
_notifyBarrierMsg = "BARRIER=1"
_notifyRdyMsg = daemon.SdNotifyReady
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a drive by comment, why do the the vars start with an underscore? I feel like this makes reading/writing them harder for no reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are consts and I refrained from capitalizing them (to avoid exporting them) but wanted to somehow discriminate them from ordinary variables.

@alexlarsson
Copy link
Contributor

I have some worries with this making the "podman play kube" process stay around for the lifetime of the pod. I think it would be better if this could be handled by conmon or the service container. On the other hand, this is better than the current state, so lets get thisin first and work from here.

So, LGTM.

@rhatdan
Copy link
Member

rhatdan commented Dec 7, 2022

I agree, this should be moved to conmon.
/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2022
@openshift-merge-robot openshift-merge-robot merged commit 6e2e9ab into containers:main Dec 7, 2022
@vrothberg vrothberg deleted the fix-16515 branch December 8, 2022 07:50
edsantiago added a commit to edsantiago/libpod that referenced this pull request Mar 27, 2023
Race introduced in containers#16709, which changed 'top' to 'true', so
there was only a narrow window in which '.State.ConmonPod'
would be valid. Remove the race.

Fixes: containers#17882

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Mar 27, 2023
Race introduced in containers#16709, which changed 'top' to 'true', so
there was only a narrow window in which '.State.ConmonPod'
would be valid. Remove the race.

Fixes: containers#17882

Signed-off-by: Ed Santiago <santiago@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 18, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Various issues in systemd.NotifyProxy sdnotify play kube policies: podman container wait, hangs
6 participants