You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
When running the upgrade-cluster.yml playbook, I would like nodes that were cordoned but failed to be drained to be uncordoned.
Why is this needed:
During the upgrade of the cluster, Kubespray cordons and then drain nodes before actually updating them.
However, if a node fails to be drained (for example because of no poddisruption budget), Kubespray will leave it cordoned.
This means repeated failing runs (CI/CD) would cause more and more nodes to be left cordoned creating capacity issues on the running cluster.
The enhanced behavior I propose is the following: if the node fails to drain, no upgrade task was actually performed yet.
So it is safe to uncordon the node. However the upgrade should STILL fail as the upgrade did not work.
The text was updated successfully, but these errors were encountered:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
What would you like to be added:
When running the upgrade-cluster.yml playbook, I would like nodes that were cordoned but failed to be drained to be uncordoned.
Why is this needed:
During the upgrade of the cluster, Kubespray cordons and then drain nodes before actually updating them.
However, if a node fails to be drained (for example because of no poddisruption budget), Kubespray will leave it cordoned.
This means repeated failing runs (CI/CD) would cause more and more nodes to be left cordoned creating capacity issues on the running cluster.
The enhanced behavior I propose is the following: if the node fails to drain, no upgrade task was actually performed yet.
So it is safe to uncordon the node. However the upgrade should STILL fail as the upgrade did not work.
The text was updated successfully, but these errors were encountered: