Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrainAndValidate rolling-update hangs if pods won't evict #2537

Closed
blakebarnett opened this issue May 9, 2017 · 33 comments
Closed

DrainAndValidate rolling-update hangs if pods won't evict #2537

blakebarnett opened this issue May 9, 2017 · 33 comments
Labels
area/rolling-update lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@blakebarnett
Copy link

I noticed a few instances where if a pod is hung in ContainerCreating state, or some other state and won't go into Evicted state Kops hangs forever waiting for it during a rolling-update.

@chrislovecnm
Copy link
Contributor

what did cli switches you use?

@blakebarnett
Copy link
Author

none, just the usual kops rolling-update cluster --yes

@chrislovecnm
Copy link
Contributor

So it should have timed out, which is interesting. Did you use the feature flag to turn on drain?

@blakebarnett
Copy link
Author

yes, in another instance where it hung I deleted a pod manually that was stuck in ContainerCreating and it moved on.

@chrislovecnm
Copy link
Contributor

@foxish any way to have a pod not evict? How do I reproduce this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@foxish
Copy link

foxish commented Jan 2, 2018

Sorry, I never saw this @chrislovecnm.
I'm not sure if this is the wrong behavior - Waiting for Container creation to complete before trying to evict seems like a safe thing to do.
We don't expect the container creating state to last that long.

@foxish
Copy link

foxish commented Jan 2, 2018

Related: kubernetes/kubernetes#48307 (comment)

@chrislovecnm
Copy link
Contributor

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 2, 2018
@sidhartanoleto
Copy link

sidhartanoleto commented Apr 17, 2018

I am having the exactly same behavior on AWS. (edited, similar issue I'd say)

I have ~10 nodes and on some of them the node cannot be drained and gets stuck on SchedulingDisabled state. All the pods inside this node is being evicted however it doesn't complete. When I terminate the instance manually it continues and goes on.

I notice that the only pods left on those nodes are managed by DaemonSets. Maybe that is somehow related.

Any way to investigate this further?

@Globegitter
Copy link
Contributor

Globegitter commented Apr 23, 2018

I had a similar issue just now, a normal pod (nginx ingress controller) was still in running state and had been deployed for over 5 days. Somehow it kept the rolling update stuck (no timeout etc). It has however been restarted 676. Unfortunately I did not look at the logs or anything before I manually terminated the pod, so I can not even verify that there restarts are related to the rolling update but it has now been fixed and the rolling update could move on. If it happens again I will make sure to check logs etc for anything suspicious.

Edit: Yeah strange just happening again, the nginx controllers could not be evicted. I could not see anything in the logs (some of them where not even serving any traffic), nothing on describe it just seemed they never received the shutdown signal. But again manually deleting the pods fixes the issue.

Even if the issue is not necessarily fixable, I wonder if it is possible to show more logs? To not have to guess that something is up.

@Globegitter
Copy link
Contributor

Globegitter commented Apr 24, 2018

So strange, this keeps on happening now and is a subset of pods related to the nginx ingress controller (it is the internal ingress controller and the default backend for both internal and public ingress controller). The interesting thing is they are all deployments with 1 pod (where-as the public ingress controller has 2) but the rolling update strategy is set to:

rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

so do not think this is related but thought it was worth posting.

@michalschott
Copy link
Contributor

I've noticed similar behaviour, I'm using nginx-ingress deployed with/as HPA.

I had to manually kill all nginx-ingress related pods, additionally I also had to kill kube-flannel pod on the drained node.

@SharpEdgeMarshall
Copy link

SharpEdgeMarshall commented Jun 13, 2018

Same issue here, this is the second time kops rolling-update --yes waits for drain until I manually kill the nginx-ingress default-backend pod.

@mf-lit
Copy link

mf-lit commented Jun 22, 2018

I had the same here, also with nginx-ingress, but the issue was revealed by addding verbosity to the rolling update:
kops rolling-update cluster --yes -v 10

I then saw:

I0622 10:20:46.881051   15660 request.go:873] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Cannot evict pod as it would violate the pod's disruption budget.","reason":"TooManyRequests","details":{"causes":[{"reason":"DisruptionBudget","message":"The disruption budget qa-nginx-ingress-controller needs 1 healthy pods and has 1 currently"}]},"code":429}

So really kops was doing the right thing, just not being very chatty about it. I just had to have more than 1 replica and then decrease minAvailable in the disruption budget and the rolling-update carried on as soon as the additional pod was healthy.

@olemarkus
Copy link
Member

Would be very helpful if kops could log when it is waiting for pods with disruption budgets.

@montyz
Copy link

montyz commented Aug 6, 2018

I agree with making this important, spent some time wondering why my node replacement was stuck. Someone in slack said it was likely to be waiting for pods with disruption budgets and that was the case.

@inodb
Copy link

inodb commented Aug 15, 2018

@mf-lit how did u change the minAvailable parameter? Is that in the spec of qa-nginx-ingress-controller? I'm experiencing the same issue with this chart: https://github.com/helm/charts/tree/master/stable/nginx-ingress. Maybe I can just send a PR to update the chart

@mf-lit
Copy link

mf-lit commented Aug 16, 2018

@inodb That helm chart has what you need:

Either set the replica count with this value:
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-deployment.yaml#L13
Or if you want to use HPA, set it with these values:
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-hpa.yaml#L18-L19

And then make sure your PDB is set to a value appropriately lower than the Replica count:
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-poddisruptionbudget.yaml#L17

@mf-lit
Copy link

mf-lit commented Aug 16, 2018

Having thought about this a little more, one of the gotchas of that chart (and many helm charts) is that the ReplicaCount defaults to 1 and the PDB MinAvailable also defaults to 1. This is perfectly reasonable, but means that it impossible to evict the pod.

I think (but haven't tested) it would be better to have minAvailable set to 0 when ReplicaCount is 1, which I think is equivalent to not having a PDB at all, which makes more sense with a single replica.

EDIT: Ah, I see someone has already brought this up:
helm/charts#7127

odavid added a commit to odavid/ansible-role-k8s-aws-cluster that referenced this issue Aug 23, 2018
@thedarkfalcon
Copy link

thedarkfalcon commented Sep 11, 2018

I'm having a similar issue, running a rolling update always hangs on "instancegroups.go:332] Waiting for 1m30s for pods to stabilize after draining."
This never times out, I have left it for over an hour several times. This seems to happen no matter the change, the most recent time was when trying to upgrade the kubernetes version, but previously to that it was when I was just adding some kubeAPIServer settings (PodSecurityPolicy). My quick/dirty solution was just to delete master and nodes in AWS and have the availability set recreate them - with the updated settings.

Edit: I haven't actually seen it in the documentation, but do I have to disable pod availability scaling first?

@FrederikNJS
Copy link
Contributor

FrederikNJS commented Jan 24, 2019

In our clusters we are running some jobs we don't want interrupted. Some of these jobs can take a full day.

Because we didn't want the jobs interrupted, we resorted to setting up a Pod Disruption Budget, setting max unavailable to 0. This works quite nicely, and keeps the jobs around until they complete.

The problem I have arises when I perform a rolling update on the cluster. Whenever kops gets to a node running one of these uninterruptible pods, it just hangs until the pod completes. Sometimes the blocked node can be the first node in a rolling update. In the mean time, more uninterruptible pods can easily have been scheduled to additional nodes that needs to be rolled.

It would be much nicer if kops would wait for a while (maybe 5 minutes), and if the drain operation had not completed, kops would skip the node, and continue with a different node. Then finally when all other nodes had been rolled, kops could get back to the blocked nodes and wait for the pod to complete.

@neolit123
Copy link
Member

neolit123 commented Aug 4, 2019

hi, i found this ticket by searching on github.
i think i'm seeing PodDisruptionBudget bugs when i enabled it on the coredns Deployment in kubeadm (as an experiment only).

  • the Deployment has 2 replicas.
  • adding a PDB to the Deployment (maxUnavailable: 1) and draining the nodes that host the Deployment pods causes unexpected behavior that is difficult to recover from!

kubernetes/kubeadm#1672 (comment)

@johngmyers
Copy link
Member

/area rolling-update

@sstarcher
Copy link
Contributor

It would be useful to have a flag that ignores PDB after a certain amount of time. This will trip anyone if they have a PDB of 1 with a replica of 1 it will just sit for a long time waiting for something that will never come.

@johngmyers
Copy link
Member

@sstarcher I believe that would be a separate feature request. I happen to think such a thing would be quite dangerous, but it would be useful when the cluster operators and the workload developers have an adversarial relationship.

I believe this particular ticket should be closed as "that's the intended behavior".

@blakebarnett
Copy link
Author

@johngmyers originally this ticket had nothing to do with PDBs. Containers can get stuck in ContainerCreating and prevent evictions, breaking a node drain (just one of many possible failure scenarios). PDBs preventing a drain are valid and I agree that Kops shouldn't ignore them. Nodes getting into a bad state and not being able to drain them is a more general k8s operational problem and maybe Kops shouldn't do anything about it either.

But it might be good to document and/or add output that explains why the timeout occurred, Kops does validation elsewhere and explains why it fails, this would be just another flavor.

@johngmyers
Copy link
Member

It might be good to change the title of this issue to limit its scope to ContainerCreating.

I believe kops's logging of hung drains is better now.

@johngmyers
Copy link
Member

@blakebarnett Would you have a procedure for getting a pod hung in ContainerCreating?

I tend to agree a stuck ContainerCreating pod blocking eviction seems a problem with the Kubernetes eviction implementation.

@blakebarnett
Copy link
Author

We've seen it usually when there has been resource contention on a node, and something puts the node into an unrecoverable state. It happens for quite a few different reasons, but I believe it's usually because of the contention and the bad behavior of most apps in that scenario. It's been hard to reproduce it intentionally though. When oom-killer kicks in at the system level and happens to pick dockerd as the process to snipe that definitely seems to be problematic.

We've seen it with the unregister_netdevice kernel issue, high CPU contention, file locking contention (currently trying to nail this down). And NIC driver resets (ENAs on AWS c5/m5 instances with older kernels are very problematic).

@johngmyers
Copy link
Member

johngmyers commented Nov 15, 2019

Could you file a Kubernetes issue? I believe pods in ContainerCreating state should not block voluntary eviction, be they stuck or not. It's not as if they have state that needs grace to terminate.

Or is it the "wait for pods to terminate" phase they're blocking, not voluntary eviction?

@olemarkus
Copy link
Member

kops now has --drain-timeout which should will prevent rolls from hanging. There is also more logging of why kops hangs on drain.

/close

@k8s-ci-robot
Copy link
Contributor

@olemarkus: Closing this issue.

In response to this:

kops now has --drain-timeout which should will prevent rolls from hanging. There is also more logging of why kops hangs on drain.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rolling-update lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests