Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove exit when cordon failed #552

Merged
merged 1 commit into from
Jan 3, 2022
Merged

Remove exit when cordon failed #552

merged 1 commit into from
Jan 3, 2022

Conversation

liorfranko
Copy link
Contributor

@liorfranko liorfranko commented Dec 19, 2021

Issue #545, if available:

Description of changes:
Don't crash when failing to drain a node due to a PDB.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@liorfranko liorfranko requested a review from a team as a code owner December 19, 2021 05:58
@liorfranko
Copy link
Contributor Author

liorfranko commented Dec 19, 2021

After the fix:

2021/12/19 05:49:57 ??? evicting pod delivery-apps/events-restore-consumer-5b44d99677-wqlk6
2021/12/19 05:49:57 ??? evicting pod delivery-apps/audience-consumer-6fb884bf75-bsblx
2021/12/19 05:49:57 ??? evicting pod delivery-apps/audience-consumer-6fb884bf75-tq72m
2021/12/19 05:49:57 ??? evicting pod delivery-apps/audience-consumer-6fb884bf75-67qn6
2021/12/19 05:49:57 ??? evicting pod delivery-apps/audience-consumer-6fb884bf75-kv47k
2021/12/19 05:49:57 ??? evicting pod delivery-apps/events-restore-consumer-5b44d99677-8mpw7
2021/12/19 05:49:57 ??? evicting pod delivery-apps/audience-consumer-6fb884bf75-mkpqk
2021/12/19 05:49:57 ??? evicting pod delivery-apps/events-restore-consumer-5b44d99677-5g267
2021/12/19 05:49:57 ??? error when evicting pods/"audience-consumer-6fb884bf75-tq72m" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"audience-consumer-6fb884bf75-bsblx" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"audience-consumer-6fb884bf75-kv47k" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"events-restore-consumer-5b44d99677-5g267" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"events-restore-consumer-5b44d99677-8mpw7" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"events-restore-consumer-5b44d99677-wqlk6" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"audience-consumer-6fb884bf75-mkpqk" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2021/12/19 05:49:57 ??? error when evicting pods/"audience-consumer-6fb884bf75-67qn6" -n "delivery-apps" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
.
.
.
2021/12/19 05:51:57 ERR There was a problem while trying to cordon and drain the node error="[error when evicting pods/\"audience-consumer-6fb884bf75-tq72m\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"events-restore-consumer-5b44d99677-5g267\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"audience-consumer-6fb884bf75-bsblx\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"audience-consumer-6fb884bf75-67qn6\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"events-restore-consumer-5b44d99677-wqlk6\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"audience-consumer-6fb884bf75-kv47k\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"audience-consumer-6fb884bf75-mkpqk\" -n \"delivery-apps\": global timeout reached: 2m0s, error when evicting pods/\"events-restore-consumer-5b44d99677-8mpw7\" -n \"delivery-apps\": global timeout reached: 2m0s]"
2021/12/19 05:51:57 INF Adding new event to the event store event={"AutoScalingGroupName":"","Description":"Rebalance recommendation received. Instance will be cordoned at 2021-12-19T01:43:42Z \n","EndTime":"0001-01-01T00:00:00Z","EventID":"rebalance-recommendation-f190afb013e23ea83edb9e971b51fd53d6a731c0efe06498960513ab1f0a932e","InProgress":false,"InstanceID":"","IsManaged":false,"Kind":"REBALANCE_RECOMMENDATION","NodeLabels":null,"NodeName":"ip-10-206-149-185.ec2.internal","NodeProcessed":false,"Pods":null,"StartTime":"2021-12-19T01:43:42Z","State":""}
2021/12/19 05:51:58 INF Requesting instance drain event-id=rebalance-recommendation-f190afb013e23ea83edb9e971b51fd53d6a731c0efe06498960513ab1f0a932e instance-id= kind=REBALANCE_RECOMMENDATION node-name=ip-10-206-149-185.ec2.internal
2021/12/19 05:51:59 INF Pods on node node_name=ip-10-206-149-185.ec2.internal pod_names=["audience-consumer-6fb884bf75-67qn6","audience-consumer-6fb884bf75-bsblx","audience-consumer-6fb884bf75-kv47k","audience-consumer-6fb884bf75-mkpqk","audience-consumer-6fb884bf75-tq72m","events-restore-consumer-5b44d99677-5g267","events-restore-consumer-5b44d99677-8mpw7","events-restore-consumer-5b44d99677-wqlk6","kafka-backup-pm9n7","istio-cni-node-cnqsj","aws-node-termination-handler-xg46k","aws-node-zdtq9","ebs-csi-node-xj2h9","ip-masq-agent-rh5pb","kube-proxy-6glcm","filebeat-pvvcw","kube-prometheus-stack-prometheus-node-exporter-rqlf6"]
2021/12/19 05:51:59 INF Draining the node

@github-actions
Copy link

github-actions bot commented Jan 3, 2022

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this PR to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

@github-actions github-actions bot added the stale Issues / PRs with no activity label Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues / PRs with no activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to drain/cordon causes CrashLoopBackOff
2 participants