Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container Timeout #6412

Closed
npmccallum opened this issue May 27, 2020 · 32 comments · Fixed by #10119
Closed

Container Timeout #6412

npmccallum opened this issue May 27, 2020 · 32 comments · Fixed by #10119
Labels
Good First Issue This issue would be a good issue for a first time contributor to undertake. kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@npmccallum
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

I want to use podman run --rm ... to run a container that is removed on exit. I also want the container to be forcibly killed if it is still running after n seconds.

Yes, I can write a podman state manager to call podman stop. But that requires me to do a lot of work. It is also racy and bug prone.

I could use the timeout command (from coreutils) to send a signal. But podman has only two modes for signal handling. The default mode forwards the signal to pid 1 in the container. However, pid 1 could just ignore the signal to bypass the timeout. If I use --sig-proxy=false then podman doesn't forward the signal to the container. But it also doesn't stop the container. Therefore, the container bypasses the timeout.

I tried looking at the --timeout option for conmon, but that doesn't do what we need.

I see two ways forward.

  1. Add support for --sig-proxy=stop. This mode would not proxy the signal to the container and would instead terminate the container (and, implicitly, do the --rm). Then timeout support could be implemented using the timeout utility from coreutils.

  2. Add a new option for --timeout=n which would cause podman run --rm --timeout=30 to forcibly shut down the container and remove it after 30 seconds. I think I would prefer this option since it doesn't require another process.

@openshift-ci-robot openshift-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 27, 2020
@rhatdan
Copy link
Member

rhatdan commented May 28, 2020

Would the container get the SIGSTOP (Actually --stop-signal value) signal or SIGKILL?
Most likely I just answered my own question. Since if you want SIGKILL you can just add --stop-signal=kill

If I restart the container, does it run for another 30 seconds?

@rhatdan
Copy link
Member

rhatdan commented May 28, 2020

@mheon @vrothberg @baude WDYT?

@rhatdan
Copy link
Member

rhatdan commented May 28, 2020

We would need to wire this into conmon. since it is the only thing left running when you run with --detatch.
@haircommander FYI

@vrothberg
Copy link
Member

This sounds completely reasonable to me.
Naming nit: to prevent confusion, I suggest to name it --rm-timeout.

@mheon
Copy link
Member

mheon commented May 28, 2020

My concern would be the complexity of stopping the container exclusively from within Conmon. I don't really want to implement the full logic of podman stop again in C (SIGTERM, timeout, SIGKILL, timeout, plus handling for containers without a PID namespace through runtime kill --all). If we can keep it simple (single SIGKILL, or a SIGTERM + simple fixed timeout + SIGKILL, and no containers without a PID namespace) it sounds a lot more reasonable.

@haircommander
Copy link
Collaborator

theoretically, we could also pass conmon a list of args for a podman call, like we do for exit command

@haircommander
Copy link
Collaborator

another thing to note is that conmon doesn't technically know when a container starts. It knows when the container starts logging things, and starts behaving like it's started, but this is not precise. We'd have to have podman send data down a pipe to tell conmon "hey, I started the container", and then we'd start the timeout.

@rhatdan
Copy link
Member

rhatdan commented May 28, 2020

I don't see this as an --rm-timeout. I don't think --rm is required.
If I want to run a container for one hour then I could do
podman create --timeout=360 ...
podman start ...
podman start ...

@rhatdan
Copy link
Member

rhatdan commented May 28, 2020

Doesn't conmon get the contents of the --stop-signal?

@npmccallum
Copy link
Author

Would the container get the SIGSTOP (Actually --stop-signal value) signal or SIGKILL?
Most likely I just answered my own question. Since if you want SIGKILL you can just add --stop-signal=kill

$ podman run --stop-signal=kill --rm -it fedora
# trap '' TERM
#
kill -TERM $PODMAN_PID

Podman forwards the SIGTERM to PID 1. PID 1 swallows the SIGTERM. If I send SIGKILL to podman, the --rm is never performed and the container is still running.

It is therefore my understanding that --stop-signal=kill only changes the signal sent to PID 1 during podman stop.

If I restart the container, does it run for another 30 seconds?

This is a singleton container running untrusted code. We never want to restart it. That would allow the untrusted code to persist across the timeout.

@rhatdan
Copy link
Member

rhatdan commented Jun 9, 2020

@vrothberg Could you take this one on?

@vrothberg
Copy link
Member

@vrothberg Could you take this one on?

This looks like larger chunk of work as it spans across libpod and conmon. I think that I should rather work on the parallel-copy detection over in c/image. WDYT?

@rhatdan
Copy link
Member

rhatdan commented Jun 10, 2020

Sounds good, we can give this to @ashley-cui @QiWang19 @ParkerVR or @ryanchpowell Or anyone else who wants to grab it.

@vrothberg vrothberg removed their assignment Jun 22, 2020
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2020

@QiWang19 PTAL

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Aug 24, 2020

@QiWang19 Did you get a chance to look at this?

@QiWang19
Copy link
Contributor

@QiWang19 Did you get a chance to look at this?

I haven't started working on this now but can add it to my list

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@kblin
Copy link

kblin commented Jan 12, 2021

Hi folks,

I'd also love to have this feature for a system running long-running number-crunchy jobs.
For my use case, the number-crunchy jobs report status via stdout, and I have a process that forks and executes podman run --detach=false ... and then consumes the output from the container to forward the status to a database. I can of course set a timer and have that trigger podman kill after the timeout expires, but having a timeout built in would be so much nicer.

@rhatdan
Copy link
Member

rhatdan commented Jan 13, 2021

Interested in opening a PR for this feature?

@rhatdan rhatdan added the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Jan 13, 2021
@kblin
Copy link

kblin commented Jan 13, 2021

I'm not sure I understand the architecture good enough to know where to start. Does this go into podman? Conmon?

@rhatdan
Copy link
Member

rhatdan commented Jan 13, 2021

Both. You would need a way to trigger the command within podman. Basically add a --timeout flag (and maybe --timeout-signal), that conmon would know to kill the container.
Then you would need an option in podman to activate it. podman run --timeout 20m ...

This would cause conmon to send run with --timeout 20m And after 20 minutes, conmon would send a stop signal to pid1, and 10 seconds later send the kill signal.

@kblin
Copy link

kblin commented Jan 14, 2021

After staring at the code for a couple of hours, I'm still not quite sure where I'd pass a new --timeout flag added to podman run, not to mention that I can't figure out why my

flags.UintVar(&runOpts.Timeout, "timeout", 0, "Stop the container after [timeout] seconds, or 0 to not time out the container")

ends up generating a --time and not a --timeout parameter after running make.

I also still have no idea where in conmon I'd send the stop signal to the container's pid 1.

That's about all the time I had to spend on something I can script with a timed callback to run podman kill on my side. If this is a "good first issue", I don't think I want to see the other ones. 😉

@rhatdan
Copy link
Member

rhatdan commented Jan 14, 2021

Some first issues are meatier then others, thanks for trying.

@kblin
Copy link

kblin commented Jan 14, 2021

If nobody picked it up by the next time I have a day or two to spare I might give it another shot. 🙂

@rhatdan
Copy link
Member

rhatdan commented Jan 14, 2021

Sounds good.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Feb 15, 2021

@kblin Did you ever get a chance to look at this?

@kblin
Copy link

kblin commented Feb 15, 2021

I didn't have the time to spare yet.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

rhatdan added a commit to rhatdan/podman that referenced this issue Apr 23, 2021
This option allows users to specify the maximum amount of time to run
before conmon sends the kill signal to the container.

Fixes: containers#6412

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Good First Issue This issue would be a good issue for a first time contributor to undertake. kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants