-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
figure out podutils and init / zombies #8779
Comments
I am of the opinion that init systems in containers are a huge code smell and I'm not sure how much effort we should go to in order to support them. Why not launch a |
Smell or not they're immensely useful, and lots of third party tools happily fork to their heart's content. We don't necessarily need a full blown init to avoid zombie processes in particular.
Don't we only allow one test container? Also the pod lifecycle...? Mounting through volumes? There are many reasons to do this in the same container. |
Similarly, due to the entrypoint binary, we already do NOT have "one process per container", we have at minimum 2, which sorta illustrates how useful running multiple binaries is... |
Since the lifecycles of the two processes are linked right now I'm not sure there is a simple way for zombies to happen. In any case, though, I think we might be OK since tests are not long-running processes, so zombie buildup should not be an issue? worst case would be a poorly-behaved getting PID exhaustion and not being able to fork anymore, right?
Only because we haven't thought through other applications today.
Just seems like it needs some thought, not an intractable problem.
Communication through the filesystem using UNIX sockets is a totally valid thing to be doing to communicate between containers. |
These two processes are linked, but the command we run generally forks at some point, and it still illustrates how it is useful to fork using someone else's binary within a container. "one process per container" rarely actually happens here. We're not going to be able to change all of that. Tools run other tools to EG spin up clusters.
No, many of the test suites we run can even take say, 15 hours, and we'd really like to not lose those.
No, PID exhaustion causes the kubelet to restart all containers. See #5877, #5887, #5700 (comment)
OK, but that's more complex than running a shell as the top level process, at minimum, and it's not quite what we need, see the next comment..
I don't believe you can do volume mounts from within the test container with a sidecar, which makes this a no-go. |
At some point we should update the pod-utilities to not require hijacking the entrypoint, which will solve this problem. |
Ah, true, forgot about this. FYI k8s natively handles this, alpha in 3.10 kubernetes/kubernetes#57973 |
@fejta was there a design for letting the sidecar know when uploads should begin / when the test process was exiting? |
Yeah, I've been following that, but that handles setting limits at a pod
level (and is alpha gated, which k/k prow doesn't typically use), which is
distinct from avoiding hitting a limit to begin with. we'd really like to
not hit a limit. The pause container work looks promising though.
…On Mon, Jul 23, 2018 at 10:59 AM Steve Kuznetsov ***@***.***> wrote:
FYI k8s natively handles this, alpha in 3.10 kubernetes/kubernetes#57973
<kubernetes/kubernetes#57973>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8779 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4Bq-oV4uKXMa5WPjl0OleSd_hh-EYkks5uJg75gaJpZM4VbQv8>
.
|
What was the limit here? I can't remember what disallowed volume mounts |
We have files inside container CA's filesystem on runtime RA that we want to mount into container CB on runtime CB. I don't think that works right. Entirely beside the point though, as we're not getting zombies with dind... we get zombies from other utils. DinD just reminded me that this sort of multiprocess work doesn't gel too well with the |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Do we have a different tracking issue for this? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
With the podutils we run our entrypoint binary as PID1 in the containers, for jobs that fork this may be problematic. We should look at options for handling this.
Specifically for zombie processes we can probably get away with something like
/bin/sh entrypoint ...
, but other things may actually make use of an init.../cc @stevekuznetsov @cjwagner
/area prow
/priority important-soon
The text was updated successfully, but these errors were encountered: