figure out podutils and init / zombies #8779

BenTheElder · 2018-07-23T16:00:37Z

With the podutils we run our entrypoint binary as PID1 in the containers, for jobs that fork this may be problematic. We should look at options for handling this.

Specifically for zombie processes we can probably get away with something like /bin/sh entrypoint ..., but other things may actually make use of an init...

/cc @stevekuznetsov @cjwagner

/area prow
/priority important-soon

The text was updated successfully, but these errors were encountered:

stevekuznetsov · 2018-07-23T16:09:31Z

I am of the opinion that init systems in containers are a huge code smell and I'm not sure how much effort we should go to in order to support them. Why not launch a dockerd container and have it share the socket? One process per container.

BenTheElder · 2018-07-23T16:13:49Z

I am of the opinion that init systems in containers are a huge code smell and I'm not sure how much effort we should go to in order to support them. Why not launch a dockerd container and have it share the socket? One process per container.

Smell or not they're immensely useful, and lots of third party tools happily fork to their heart's content. We don't necessarily need a full blown init to avoid zombie processes in particular.

Why not launch a dockerd container and have it share the socket? One process per container.

Don't we only allow one test container? Also the pod lifecycle...? Mounting through volumes? There are many reasons to do this in the same container.

BenTheElder · 2018-07-23T16:14:55Z

Similarly, due to the entrypoint binary, we already do NOT have "one process per container", we have at minimum 2, which sorta illustrates how useful running multiple binaries is...

stevekuznetsov · 2018-07-23T16:25:16Z

Since the lifecycles of the two processes are linked right now I'm not sure there is a simple way for zombies to happen. In any case, though, I think we might be OK since tests are not long-running processes, so zombie buildup should not be an issue? worst case would be a poorly-behaved getting PID exhaustion and not being able to fork anymore, right?

Don't we only allow one test container?

Only because we haven't thought through other applications today.

Also the pod lifecycle...?

Just seems like it needs some thought, not an intractable problem.

Mounting through volumes? There are many reasons to do this in the same container.

Communication through the filesystem using UNIX sockets is a totally valid thing to be doing to communicate between containers.

BenTheElder · 2018-07-23T16:37:12Z

Since the lifecycles of the two processes are linked right now I'm not sure there is a simple way for zombies to happen.

These two processes are linked, but the command we run generally forks at some point, and it still illustrates how it is useful to fork using someone else's binary within a container. "one process per container" rarely actually happens here. We're not going to be able to change all of that. Tools run other tools to EG spin up clusters.

I think we might be OK since tests are not long-running processes, so zombie buildup should not be an issue?

No, many of the test suites we run can even take say, 15 hours, and we'd really like to not lose those.

worst case would be a poorly-behaved getting PID exhaustion and not being able to fork anymore, right?

No, PID exhaustion causes the kubelet to restart all containers. See #5877, #5887, #5700 (comment)

Only because we haven't thought through other applications today.

Just seems like it needs some thought, not an intractable problem.

OK, but that's more complex than running a shell as the top level process, at minimum, and it's not quite what we need, see the next comment..

Communication through the filesystem using UNIX sockets is a totally valid thing to be doing to communicate between containers.

I don't believe you can do volume mounts from within the test container with a sidecar, which makes this a no-go.

fejta · 2018-07-23T17:00:00Z

At some point we should update the pod-utilities to not require hijacking the entrypoint, which will solve this problem.

stevekuznetsov · 2018-07-23T17:59:20Z

No, PID exhaustion causes the kubelet to restart all containers.

Ah, true, forgot about this. FYI k8s natively handles this, alpha in 3.10 kubernetes/kubernetes#57973

stevekuznetsov · 2018-07-23T18:02:01Z

At some point we should update the pod-utilities to not require hijacking the entrypoint, which will solve this problem.

@fejta was there a design for letting the sidecar know when uploads should begin / when the test process was exiting?

BenTheElder · 2018-07-23T18:02:27Z

Yeah, I've been following that, but that handles setting limits at a pod level (and is alpha gated, which k/k prow doesn't typically use), which is distinct from avoiding hitting a limit to begin with. we'd really like to not hit a limit. The pause container work looks promising though.

…

On Mon, Jul 23, 2018 at 10:59 AM Steve Kuznetsov ***@***.***> wrote: FYI k8s natively handles this, alpha in 3.10 kubernetes/kubernetes#57973 <kubernetes/kubernetes#57973> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8779 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA4Bq-oV4uKXMa5WPjl0OleSd_hh-EYkks5uJg75gaJpZM4VbQv8> .

stevekuznetsov · 2018-07-23T18:02:56Z

I don't believe you can do volume mounts from within the test container with a sidecar, which makes this a no-go.

What was the limit here? I can't remember what disallowed volume mounts

BenTheElder · 2018-07-23T18:14:31Z

We have files inside container CA's filesystem on runtime RA that we want to mount into container CB on runtime CB. I don't think that works right.

Entirely beside the point though, as we're not getting zombies with dind... we get zombies from other utils. DinD just reminded me that this sort of multiprocess work doesn't gel too well with the ENTRYPOINT hijacking we do currently, everything we run does fork.

fejta-bot · 2018-10-21T19:05:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

BenTheElder · 2018-10-23T16:21:41Z

At some point we should update the pod-utilities to not require hijacking the entrypoint, which will solve this problem.

Do we have a different tracking issue for this?
/remove-lifecycle stale
for now, feel free to close if we have a better issue..

fejta-bot · 2019-01-21T17:04:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-02-20T17:22:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-03-22T18:06:16Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-03-22T18:06:24Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added area/prow Issues or PRs related to prow priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 23, 2018

BenTheElder mentioned this issue Jul 23, 2018

figure out DinD + podutils #8780

Closed

BenTheElder mentioned this issue Jul 24, 2018

DinD tracking issue #8812

Closed

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 20, 2019

k8s-ci-robot closed this as completed Mar 22, 2019

BenTheElder mentioned this issue Oct 16, 2023

images/krte: add Rootless Docker (and systemd as a dependency) #30744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figure out podutils and init / zombies #8779

figure out podutils and init / zombies #8779

BenTheElder commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

fejta commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018 •

edited

Loading

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018 via email

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

fejta-bot commented Oct 21, 2018

BenTheElder commented Oct 23, 2018

fejta-bot commented Jan 21, 2019

fejta-bot commented Feb 20, 2019

fejta-bot commented Mar 22, 2019

k8s-ci-robot commented Mar 22, 2019

figure out podutils and init / zombies #8779

figure out podutils and init / zombies #8779

Comments

BenTheElder commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

fejta commented Jul 23, 2018

stevekuznetsov commented Jul 23, 2018 • edited Loading

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018 via email

stevekuznetsov commented Jul 23, 2018

BenTheElder commented Jul 23, 2018

fejta-bot commented Oct 21, 2018

BenTheElder commented Oct 23, 2018

fejta-bot commented Jan 21, 2019

fejta-bot commented Feb 20, 2019

fejta-bot commented Mar 22, 2019

k8s-ci-robot commented Mar 22, 2019

stevekuznetsov commented Jul 23, 2018 •

edited

Loading