Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toleration with toleration seconds condition #24041

Closed

Conversation

zhouya0
Copy link
Contributor

@zhouya0 zhouya0 commented Sep 22, 2020

This PR adds the condition that the tolerations with toleration seconds.
Ref: kubernetes/kubernetes#92170 (comment)

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 22, 2020
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Sep 22, 2020
@zhouya0 zhouya0 force-pushed the add_toleration_seconds_condition branch 2 times, most recently from 0012f6b to ad4d75f Compare September 22, 2020 07:10
@zhouya0
Copy link
Contributor Author

zhouya0 commented Sep 22, 2020

/assign @alculquicondor

@netlify
Copy link

netlify bot commented Sep 22, 2020

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 8bfb8dd

https://deploy-preview-24041--kubernetes-io-master-staging.netlify.app

@netlify
Copy link

netlify bot commented Sep 22, 2020

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 3409144

https://deploy-preview-24041--kubernetes-io-master-staging.netlify.app

@@ -101,6 +101,9 @@ effect `PreferNoSchedule` then Kubernetes will *try* to not schedule the pod ont
* if there is at least one un-ignored taint with effect `NoExecute` then the pod will be evicted from
the node (if it is already running on the node), and will not be
scheduled onto the node (if it is not yet running on the node).
* if there is only one tolerated taint with effect `NoExecute` then the pod will still be scheduled onto the node even if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes very little sense, sorry.

I think this should be pointed out as a note to the following paragraphs, which describe the tolerationSeconds functionality. There is a toleration anyways, so the third bullet applies, thus the pod can be scheduled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouya0 I'm afraid I agree with @alculquicondor

Was there a part of the page you found confusing? Maybe knowing that will help someone propose a different change that still reduces the confusion you noticed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor @sftim
I followed your suggestions. Please review this again :)

@zhouya0 zhouya0 force-pushed the add_toleration_seconds_condition branch from ad4d75f to 3409144 Compare September 28, 2020 03:13
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign sftim
You can assign the PR to them by writing /assign @sftim in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -148,6 +148,14 @@ means that if this pod is running and a matching taint is added to the node, the
the pod will stay bound to the node for 3600 seconds, and then be evicted. If the
taint is removed before that time, the pod will not be evicted.

{{< note >}}
The evicted pod with `tolerationSeconds` field has the possibility to be scheduled to the prior node again. To avoid such behavior:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are lacking some subsections for the whole "concepts" section. @sftim, could you advice?

In particular, I feel like there should be subtitle before line 132.

Other than that, we have to stress the fact that, even with "tolerationSeconds", this is still a toleration, and for that reason the Pod can get scheduled in the same node again.

Instead of "to avoid such behavior", I would say "If this is not what you want", then "you can add a NoSchedule taint additionally to the NoExecute taint".

Then, I don't think you have to distinguish system-level taints (what is that anyways?). Only the second point is relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to add a heading before line 132 - no objection from me for doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that then that heading seems isolated. We don't have a heading for "taints" or "tolerations". But maybe we can fix that in a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think system-level taints is like node.kubernetes.io/unreachable when node is down. And controller will add both NoExcute and NoSchedule taint to the node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already covered in https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-nodes-by-condition

The fact that they are added automatically doesn't change how you write tolerations for them :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, PR #25998 is improving some of the documentation in this area.

@alculquicondor
Copy link
Member

ping for the last review comments

@zhouya0
Copy link
Contributor Author

zhouya0 commented Oct 23, 2020

ping for the last review comments

Sorry I have to say, with my poor english, I can't do Other than that, we have to stress the fact that, even with "tolerationSeconds", this is still a toleration, and for that reason the Pod can get scheduled in the same node again. in a standard format.

I'll leave this to other contributors :)

@alculquicondor
Copy link
Member

Thanks for the progress @zhouya0

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouya0 this proposed wording is not quite idiomatic English. I have guessed at what you meant to say and proposed another way to write that. How do these suggestions look?

Comment on lines +155 to +156
- If the node is tainted by system admin, the best practice is to apply both `NoExecute` and `NoSchedule` taints; otherwise, the pod may be
scheduled and evicted back and forth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the node is tainted by system admin, the best practice is to apply both `NoExecute` and `NoSchedule` taints; otherwise, the pod may be
scheduled and evicted back and forth.
- If you are setting a `NoExecute` taint manually on a node, you should normally also
set `NoSchedule`. Otherwise, any new Pod that could run on this node may be scheduled
onto, it but then immediately face eviction because of the `NoExecute` taint.

Comment on lines +153 to +154
- If the node is added system-level taints, Kubernetes will be responsible for applying `NoExecute` taint, as well as `Noschedule` taint.
So the pod won't be scheduled to the prior node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouya0 is this another way of writing what you meant?

Suggested change
- If the node is added system-level taints, Kubernetes will be responsible for applying `NoExecute` taint, as well as `Noschedule` taint.
So the pod won't be scheduled to the prior node.
- If the node is tainted based on kubelet or node status , Kubernetes is responsible for applying the `NoExecute` and `NoSchedule` taints. These taints prevent the scheduler placing the replacement Pod onto the same node.

@kbhawkey
Copy link
Contributor

Hi @zhouya0 .
Thanks for contributing!
What do you think about accepting the suggested changes?

@tengqm
Copy link
Contributor

tengqm commented Mar 29, 2021

/close
No response from author for a long ... time.

@k8s-ci-robot
Copy link
Contributor

@tengqm: Closed this PR.

In response to this:

/close
No response from author for a long ... time.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants