Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uwsgi master graceful shutdown #1974

Closed
itielshwartz opened this issue Jan 31, 2019 · 12 comments
Closed

uwsgi master graceful shutdown #1974

itielshwartz opened this issue Jan 31, 2019 · 12 comments

Comments

@itielshwartz
Copy link

itielshwartz commented Jan 31, 2019

I'm running uwsgi+flask application, The app is running as a k8s pod.

When i deploy a new pod (a new version), the existing pod get SIGTERM.

This causes the master to stop accepting new connection at the same moment, what causes issues as the LB still pass requests to the pod (for a few more seconds).

I would like the master to wait 30 sec BEFORE stop accepting new connections (When getting SIGTERM) but couldn't find a way, is it possible?

My uwsgi.ini file:

;https://uwsgi-docs.readthedocs.io/en/latest/HTTP.html
http = :8080
wsgi-file = main.py
callable = wsgi_application
processes = 2
enable-threads = true
master = true
reload-mercy = 30
worker-reload-mercy = 30
log-5xx = true
log-4xx = true
disable-logging = true
stats = 127.0.0.1:1717
stats-http = true
single-interpreter= true
;https://github.com/containous/traefik/issues/615
http-keepalive=true
add-header = Connection: Keep-Alive

Also asked on stackoverflow: https://stackoverflow.com/questions/54459949/uwsgi-master-graceful-shutdown but no luck

@xrmx
Copy link
Collaborator

xrmx commented Jan 31, 2019

Shouldn't you kill the old version only after you deploy the new pod?

@itielshwartz
Copy link
Author

itielshwartz commented Jan 31, 2019

Its part of k8s rolling update + the time it take him to remove the Pod from the endpoint list, can't control

@xrmx
Copy link
Collaborator

xrmx commented Jan 31, 2019

@itielshwartz a quick google search revelead there's readinessProbe to avoid your issue

@itielshwartz
Copy link
Author

itielshwartz commented Jan 31, 2019

Hi @xrmx sadly this is not the case , the readiness probes doesn't run every second , and between the time k8s send sigterm to the time the prob there is a time uwsgi isn't working.

More then that after sigtrem the pod shouldn't get traffic , but it does as the ips table take time to update https://hackernoon.com/graceful-shutdown-in-kubernetes-435b98794461.

So like I said this feature is ineeded needed...

@xrmx
Copy link
Collaborator

xrmx commented Jan 31, 2019

The readiness probe on the new pod should avoid killing the old before the new is up, on the old it should make k8s stop routing the traffic to it. If that does not work you should probably fix your cluster configuration no? Adding an arbitrary delay on process teardown seems a hack.

@itielshwartz
Copy link
Author

Hi @xrmx did you read the link I sent - https://hackernoon.com/graceful-shutdown-in-kubernetes-435b98794461.?
After the old pod is getting SIGTERM (as it should be replaced, and a new pod is already up) it doesn't matter about the readniess prob, he(old pod) shouldn't get any traffic - but he still might (and is) getting for a few seconds, it is a known issue, not my cluster not my conf.

It is (maybe) a hack, but a one that is not really custom to me (it is important to anyone who want 100% uptime when rolling a new version with k8s)

@xrmx
Copy link
Collaborator

xrmx commented Jan 31, 2019

@itielshwartz That's a post from 2017, it does not feel that authoritative sorry. Again, if you are sending SIGTERM and still sending traffic does not look right to me. But hey I'm not here to convince you.

@itielshwartz
Copy link
Author

itielshwartz commented Jan 31, 2019

@xrmx I totally agree with

if you are sending SIGTERM and still sending traffic does not look right to me

But as this is k8s deafult i don't think theres any choise.

i just wanted to know is uwsgi can provide me a solution, one of the reason i opened this issue is beacuse (i guess) other people will face the same problem...

If uwsgi has no solution, i will use the k8s preStop hook, feel more hacky to me (as i can't test it locally, and it's very k8s), but it will solve the issue.

about the post, it is indeed from 2017, but as the first commenter stated:

The behavior you’re seeing is caused by lag in kube-proxy, and it’s apparently by design.

Kubernetes cannot, by definition, maintain perfect consistency in a distributed scenario where network partitions are common. Kubernetes chooses to be eventually consistent here, in order to be available and partition-tolerant.

The usual solution is to add a “preStop” hook to the pod that sleeps for, say, 3 seconds. This allows the pod to wait until it’s properly deregistered. It feels hacky, sure, but it works. Of course, this doesn’t solve what to do with persistent connections.
Long discussion in this Github issue.

Also heres the ongoing issue he mentioned:

kubernetes-retired/contrib#1140 (comment)

Just ran into this issue - at the very least this should be better documented, I'm sure the vast majority of people using rolling updates believe that kubernetes waits until a pod is out of rotation before sending SIGTERM, and this is not the case!

I also think that even if not implemented right away, sequencing SIGTERM to happen after a pod has been taken out of rotation should be the eventual goal: I don't think saying "gunicorn is doing the wrong thing" is a useful response, that's just kubernetes redefining what it thinks these signals mean, gunicorn and other servers behave in a fairly standard way wrt signals, and kubernetes should aim to be compatible with them rather than offloading the problem onto users.

Anyway thx for trying to help (and for uwsgitop!)

@xrmx
Copy link
Collaborator

xrmx commented Jan 31, 2019

BTW if you are on uwsgi 2.0 you should add die-on-term to exit and not reload on SIGTERM

@itielshwartz
Copy link
Author

Thanks I'm using it :)
I tried to remove it while experimenting.

Anyway issue is resovled (will use hooks) thanks

@amramtamir
Copy link

Hi @itielshwartz , can you share your solution with the hooks?
Thanks!

@itielshwartz
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants