-
Notifications
You must be signed in to change notification settings - Fork 266
modules/bootkube: remove critical pod annotation #1702
Conversation
@@ -73,8 +71,6 @@ spec: | |||
pod-anti-affinity: kube-controller-manager-${kubernetes_version} | |||
tier: control-plane | |||
tolerations: | |||
- key: "CriticalAddonsOnly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if we want to remove this toleration though. Might need to look into the upstream behaviors this ties into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave the toleration then it is possible that a controller-manager or scheduler pod could get scheduled into a slot that was designated for a different critical pod. Maybe this is not a big issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Well we probably do want the controller-manager to have this toleration. it is critical it's just unsafe with the behavior of the rescheduler. But I'm not immediately sure what the interaction would be like. The rescheduler might just keep kicking it off, and then it keeps getting rescheduled.
It's probably just not safe to run the rescheduler on the master nodes at all -- maybe that's the end goal we get to (and then we also remove the annotation / taint). And priority / preemption will deprecate rescheduler anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be less of a big deal now that I'm thinking of it. We should just say "do not run the rescheduler on master nodes, your master nodes should be provisioned such that they have a minimum of X resources and nothing else is scheduled there".
@robszumski @sym3tri If this is done without a the same being done in an update payload, please track the skew that this introduces and the version it is introduced in. To update this currently it would require custom migration code (annotations are not handled for scheduler/controller-manager until the patch support is finalized). |
ok to test |
ping @squat, is this ready to merge? |
@s-urbaniak, this fell off my radar. I need to check back in with KVO team to see if patch support was finalized; otherwise I need to write a migration to fix the skew. |
@s-urbaniak @squat Patch support is merged in so we can trivially replicate this in the KVO for this release. |
Should we also remove it from the (looks like we also have it in |
ref (priority/preemption): kubernetes/enhancements#268 |
@diegs I'll make the adjustments in apiserver and kube-proxy as well |
@squat is this in or out for |
91f82f1
to
23e0456
Compare
@diegs PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
23e0456
to
2e91a95
Compare
@diegs sorry, I just rebased on master to fix a test failure. PTAL once more 😅 |
Remove the critical pod annotation and toleration from the controller-manager and scheduler, since this can result in dangerous behavior as discussed in kubernetes-retired/bootkube#519.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry to be late to the review - but are we sure we want to remove the toleration? |
Hmm. Maybe ignore me. The behavior with just the toleration would be weird too (pod would be deleted, then re-created). I guess my concern was if the taint would be used elsewhere (not just re-scheduler), and we actually don't want these pods messed with. I skimmed the priority / preemption discussion and I don't think this taint will be re-used (we can always revisit if we need to later). |
@aaronlevy that taint is added by the rescheduler, which this is removing support for (and apparently the rescheduler is deprecated anyway, right)? https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ |
ok to test |
Yeah, I retract my concern |
One AWS flake. Merging |
Remove the critical pod annotation and toleration from the
controller-manager and scheduler, since this can result in dangerous
behavior as discussed in
kubernetes-retired/bootkube#519.
fixes: #734
cc @aaronlevy