Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

Closed
wants to merge 9 commits into from

Conversation

amshuman-kr
Copy link

A KEP for the PR that addresses the kubernetes issue #52345.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 4, 2018
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jul 4, 2018
@amshuman-kr
Copy link
Author

/sig node
/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 4, 2018
@idealhack
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 4, 2018
@idealhack
Copy link
Member

We have a wrong suggestion in the verify test.

client9/misspell#136

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to restart all containers when any container exits?

We've discussed in the past the idea of a "keystone" or "primary" container which would govern the whole pod's lifecycle. For restart-all pods, this would be the one that would trigger the teardown. For jobs, this would be the one that exits and everything else can be assume to be complete.

@yujuhong @kow3ns @erictune

@smarterclayton another case of desire for sub-pod orchestration

- "@smarterclayton"
approvers:
- "@liggitt"
- "@derekwaynecarr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshuman-kr
Copy link
Author

@thockin No. I agree, it may not be desirable to restart the pod if any container exits. I like the keystone or primary container approach. It would be better than my proposal here. Is there any consensus on this approach?

@erictune
Copy link
Member

erictune commented Jul 9, 2018 via email

@amshuman-kr
Copy link
Author

amshuman-kr commented Jul 10, 2018

@erictune Agreed. Is #25908 the right place to track this?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: dchen1107

If they are not already assigned, you can assign the PR to them by writing /assign @dchen1107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amshuman-kr
Copy link
Author

This KEP seems to be the relevant one for primary/sidecar container feature.

@amshuman-kr
Copy link
Author

@thockin @erictune I raised this requirement of re-executing init-containers when the primary or any non-sidecar container terminates on KEP 0008. But from the response I understand that such a requirement is currently not considered there.

What would you recommend? Is there any chance this requirement can be added to KEP 0008? Or should I update this KEP as an additional requirement on KEP 0008?

@justaugustus
Copy link
Member

/kind kep

@thockin thockin changed the title New pod restartPolicy to restart the whole pod instead of just a container. KEP: New pod restartPolicy to restart the whole pod instead of just a container Oct 18, 2018
Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @kow3ns

@@ -0,0 +1,125 @@
---
kep-number: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the KEP to the latest and increment that file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattfarina Thanks for the review!

Actually, I am in favour of closing this KEP if KEP 0008 can be expanded to include the case of RestartPolicyAlways.

That is, if there are some sidecar containers and one or more primary containers and the restartPolicy is Always then if any of the primary containers terminate then the whole pod is restarted; not just the terminated container.

If the above requirement can be included in the scope of KEP 0008 then this KEP would be redundant.

@mattfarina
Copy link
Contributor

@derekwaynecarr @dchen1107 out of curiosity, where does this stand in sig-node right now?

@mattfarina
Copy link
Contributor

FYI, We've added this to the agenda for Mondays SIG Apps to discuss.


But both the `OnFailure` as well as the `Always` restart policies restart the individual containers in question and not the whole pod. This is, for the most part, desirable, even optimal.

However, there are scenarios (some documented in [this issue][issue]) where the many containers in the pod (including init containers) might be interlinked or inter-dependent in such a way as to require closer co-ordination when any one of its containers are restarted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshuman-kr We discussed this in SIG Apps this week. The motivation for adding this was not completely understood. Could you please expand on user stories (see the template) that outline situations that this would be used for. This will help others understand the need and now this can help with it.

@justaugustus
Copy link
Member

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.


### Risks and Mitigations

The `restartPolicy` or `AlwaysPod` would be a new value for an existing field in the pod specefication. So, the question of backward compatibility may not apply.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not be backwards compatible for old kubelets, so we can’t change restartPolicy in v1 of the api. Something like “restartAffects: Pod” or a lifecycle hook would be necessary. It’s really orthogonal to restart policy and could be used with OnFailure or Always.

@justaugustus
Copy link
Member

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RiRa12621
Copy link

RiRa12621 commented Apr 4, 2019

Just to clarify: this PR was simply closed and not moved anywhere?
Because to me this seems still relevant and I would like to follow up.

@adamzr
Copy link

adamzr commented Mar 12, 2020

I have a use case where the initContainer creates a one time use token that the application in the regular container consumes at startup. If the regular container is restarted it needs the initContainer to restart as well in order to generate a new token. This could solve my problem.

@mattfarina How I can I get this issue re-visited?

@thockin
Copy link
Member

thockin commented Mar 12, 2020

This happens to line up with the larger topic of pod and container lifecycle. If we are to pursue this idea we need a shepherd from sig-node who can make sure the overall lifecycle is considered.

@derekwaynecarr

@dimitriosstander
Copy link

I have a use case of a statefulset that when one containers fails it takes hours to become healthy again but if I restart the whole pod on failure it recovers within minutes.

@tulsluper
Copy link

Restart pod if a container fails would be helpful in case if something wrong on the Kubernetes level, not an application.

For example - kubernetes/kubernetes#105933

  • kubelet/csi_driver cannot check SELinux support;
  • pod starts without selinuxLabel;
  • init container cannot perform chmod command;
  • the container continuously restarts;
    Pod really cannot work until it is terminated, and a new one starts with selinuxLabel.

But, despite the nature of the case, users are more likely to restart the pod than look for how to solve the problem in the configuration or application, that is not good.

@szh
Copy link

szh commented Dec 2, 2022

@amshuman-kr do you want to open an issue for this in k/enhancements?

@amshuman-kr
Copy link
Author

@szh The discussions on container life-cycle enhancement were too fragmented and unfortunately, I didn't have the bandwidth to follow up across various semi-related discussions. Hence, I have abandoned this enhancement.

@szh
Copy link

szh commented Dec 6, 2022

OK. I just added kubernetes/enhancements#3676 to track it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.