Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ClusterOperator Scheduling #650

Merged

Conversation

michaelgugino
Copy link
Contributor

No description provided.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign russellb after the PR has been reviewed.
You can assign the PR to them by writing /assign @russellb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

## Summary

To ensure system components, especially ClusterOperators, have the correct
set of tolerations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly PR with adjustments to this existing enhancements content, instead of creating a new, parallel enhancement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have used grep! I looked in the scheduling folder but didn't find this obviously.

draining a node can complete. This results in hot-looping between draining
and the scheduler.

Any other NoSchedule toleration is fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of Chesterton fence, I’d like to see a review of the components that thought they should set this and why (representative sample).

That ensures we correct misconceptions about when the toleration is appropriate.

If it was accidental, what would have prevented the accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one I know of for sure is the CVO. I don't have data if anything else is doing it or not. I will take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the CVO is the only-non daemonset pod that has this toleration, at least from a typical CI run. The daemonsets get that toleration dynamically, so everything else looks correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion sometime ago on whether CVO should tolerate NoSchedule taint and the answer was yes. Has something changed now? I was always in favour of specific tolerations instead of blanket tolerations. How about adding a taint called drain-in-progress to node with NoSchedule effect. That won't be tolerated by CVO which should help in proceeding with draining smoothly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing that discussion. There is more than one 'NoSchedule' taint. Based on the discussion you linked, it seems node.kubernetes.io/unschedulable was included by mistake. This is not discussed directly, the discussion revolved around unreachable nodes and unready nodes. node.kubernetes.io/unschedulable does not affect those scenarios and is only used for administrative cordons (eg, node.spec.unschedulable=True).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entire sure of that, IIUC I got a firm yes on unschedulable however if you think this was not needed, I can change the tolerations of CVO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many taints that have the effect 'NoSchedule' but there is one 'node.kubernetes.io/unschedulable' taint with the effect 'NoSchedule'. It appears in the previous discussion only NoSchedule was considered rather than the specific taints (at the least, node.kubernetes.io/unschedulable was not itself considered directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any NoSchedule taint != unschedulable condition. unschedulable condition == node.kubernetes.io/unschedulable taint. unschedulable is a node condition which is being represented via a special well known taint called node.kubernetes.io/unschedulable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to use the k8s precise language. There are taints, and those taints have effects. The effect we're discussing is NoSchedule. That effect is caused by one or more taints.

I did not see the taint node.kubernetes.io/unschedulable discussed in the link you provided. There were only discussions around the NoSchedule effect.



### Test Plan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed we would enforce this via a test? Or did I miss that above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that ensure control plane operators do not make themselves unevictable does not cover node.kubernetes.io/unschedulable today? Just unreachable and not-ready.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for initiating this discussion @michaelgugino. This looks like a step in right direction.

draining a node can complete. This results in hot-looping between draining
and the scheduler.

Any other NoSchedule toleration is fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion sometime ago on whether CVO should tolerate NoSchedule taint and the answer was yes. Has something changed now? I was always in favour of specific tolerations instead of blanket tolerations. How about adding a taint called drain-in-progress to node with NoSchedule effect. That won't be tolerated by CVO which should help in proceeding with draining smoothly.



### Test Plan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do

@@ -222,6 +225,10 @@ spec:
- operator: Exists
```

Tolerating all taints should be reserved for DaemonSets and static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which static pods are you referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there more than one type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant which specific static pods in OpenShift. IIUC, static pods should be tolerating all NoExecute taints not NoSchedule as their manifests are directly present on the node, there is no need for them to go through scheduling cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't an option to tolerate all taints of a certain effect, just "All" generally.

@@ -189,6 +189,9 @@ following taints if doing so is necessary to form a functional Kubernetes node:
Operators should not specify tolerations in their manifests for any of the taints in
the above list without an explicit and credible justification.

Operators should never specify the following toleration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other guidelines in this section include justification, so that someone coming to the doc without all of the background can learn why we have each restriction. This addition would benefit from a sentence or two explaining why this value is so special that it should never be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks good.

@smarterclayton
Copy link
Contributor

/lgtm
/approve

In the future we may come up with exceptions, which are reasonable to propose as new enhancements. We may also wish to take this feedback back to sig-scheduling and gather other perspectives on whether tolerations are missing something in between "list" and "all".

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 18, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2021
@openshift-merge-robot openshift-merge-robot merged commit fe57853 into openshift:master May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants