Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gunicorn settings running with docker #30

Open
TimMcCauley opened this issue Apr 14, 2018 · 19 comments
Open

Optimize gunicorn settings running with docker #30

TimMcCauley opened this issue Apr 14, 2018 · 19 comments
Assignees

Comments

@TimMcCauley
Copy link
Contributor

Sporadically the gunicorn workers time out - this may be due to the worker class settings: http://docs.gunicorn.org/en/stable/settings.html

@TimMcCauley TimMcCauley self-assigned this Apr 14, 2018
TimMcCauley added a commit that referenced this issue Aug 7, 2018
Replacing sync with gevent, closes #30
@TimMcCauley
Copy link
Contributor Author

https://pythonspeed.com/articles/gunicorn-in-docker/

@TimMcCauley TimMcCauley reopened this May 22, 2019
@TimMcCauley TimMcCauley changed the title Understand worker_class of gunicorn Optimize gunicorn settings running with docker May 22, 2019
@TimMcCauley
Copy link
Contributor Author

@zephylac have you any experience with gunicorn settings? Sometimes requests are timing out on our live servers using the following settings:

workers = 2
worker_class = 'gevent'
worker_connections = 1000
timeout = 30
keepalive = 2

I now am trying the following settings instead which are recommended in the post above.

worker_class = 'gthread'
threads = 4

@zephylac
Copy link
Contributor

zephylac commented May 22, 2019

I don't have any experience with gunicorn but I can try to have a look into it and find some info.

I'm currently spamming my instance with request but I didn't experienced any timeout (for now).

@zephylac
Copy link
Contributor

I've looked into it a little bit.
In the article you mentionned they were also talking about --worker-tmp-dir which might cause problems to workers.

I've already seen some info about threads option. Opinions seemed to converged to threads = workers.
It seems that the ‘(solution)[https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/]’ some found was to expose NGINX in front of gunicorn.

On my side I've tried to timeout my workers (without changing current gunicorn parameters). On both extreme load or rest, my workers don't seem to timeout.

@TimMcCauley
Copy link
Contributor Author

Thanks for looking this up @zephylac - if you are running your batch requests, could you also run them against api.openrouteservice.org at the same time? I can send you a token allowing a higher quota - if you agree - which email could I send the token to?

@zephylac
Copy link
Contributor

I've sent you an email !

@zephylac
Copy link
Contributor

Under which architecture are you running your service ? Are you using docker ? Are you running on VM or dedicated ?

@TimMcCauley
Copy link
Contributor Author

We are running this on a VM in our openstack environment with 32GB RAM and 8 cores. The postgis database is running on a different and smaller VM with unfortunately with very slow disks (which soon will be updated to SSDs). The containers running on this VM are

ubuntu@ors-microservices:~|⇒  sudo docker ps
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS                      NAMES
68404976f9d6        openelevationservice_gunicorn_flask_2      "/oes_venv/bin/gun..."   8 weeks ago         Up 2 days           0.0.0.0:5021->5000/tcp     openelevationservice_gunicorn_flask_2_1
6959766a7ee9        openelevationservice_gunicorn_flask        "/oes_venv/bin/gun..."   8 weeks ago         Up 2 days           0.0.0.0:5020->5000/tcp     openelevationservice_gunicorn_flask_1
ec736d4cd30c        openpoiservice_gunicorn_flask_05122018_2   "/ops_venv/bin/gun..."   5 months ago        Up 24 hours         0.0.0.0:5006->5000/tcp     openpoiservice_gunicorn_flask_05122018_2_1
c62417a4f60e        openpoiservice_gunicorn_flask_05122018     "/ops_venv/bin/gun..."   5 months ago        Up 24 hours         0.0.0.0:5005->5000/tcp     openpoiservice_gunicorn_flask_05122018_1

@zephylac
Copy link
Contributor

zephylac commented May 23, 2019

Does the workers are timing out even on idle ? Or just under load ?

I've looked on my logs, none of my workers have timed out during 1 week of intense load.

@TimMcCauley
Copy link
Contributor Author

Some requests will simply timeout but I haven't found a pattern for this yet.

@zephylac
Copy link
Contributor

zephylac commented Jun 6, 2019

Maybe PostgreSQL12 & PostGIS 3 will fix a part of this issue by supporting correctly the parallelization.

@TimMcCauley
Copy link
Contributor Author

Agreed. Did you test the live API with the token I sent you by any chance @zephylac ?

@zephylac
Copy link
Contributor

zephylac commented Jun 9, 2019

Yup I tried but it seems it has expired.

@TimMcCauley
Copy link
Contributor Author

Ah shit, sorry - it's now extended forever ;-) and won't expire anymore (same token as in the email).

@boind12
Copy link

boind12 commented Nov 10, 2020

Hi @TimMcCauley,
obviously it has been a while, but as I am facing the same issue you described (random timeouts with larger batches of POI requests using docker) I am wondering, if you have found a solution?

@lingster
Copy link

maybe this topic might help? https://pythonspeed.com/articles/gunicorn-in-docker/

@boind12
Copy link

boind12 commented Nov 15, 2020

Hi @lingster,
this link was mentioned earlier by Tim. I was unable to solve the problem using it.

@TimMcCauley
Copy link
Contributor Author

Sorry for joining the party so late.

@boind12 could you run ANALYZE in the ops schema once and check again? What kind of requests are you running and are you able to do the same directly in SQL and see how it behaves (you can print the sql query and fill the placeholders manually)? How much memory are you giving Docker and have you played around with pgtune settings? In a nutshell: it's most likely a postgres issue.

@boind12
Copy link

boind12 commented Nov 17, 2020

Hi @TimMcCauley,
thanks for your support!
I am using the following setup:

  • Host: 16GB, 2vCPU with 50GB SSD (Google Cloud e2-highmem)
  • The host is running:
    • 1x Openrouteservice: https://github.com/GIScience/openrouteservice
    • 1x Openpoiservice
    • 1x postgis: https://hub.docker.com/r/kartoza/postgis/
      I am running large batch request for POIs with >50km2 area, hence I assume it takes some of them longer then the 30s timeout of the gunicorn from openpoiservice. By increasing the timeout of the gunicorn runner to 60s I was able to solve the issue.
      However I now migrated the postgis from the VM to a dedicated Google PostGre instance. Maybe this helps further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants