Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why example docker-compose.yml file is not suited for production #16041

Closed
david-woelfle opened this issue May 25, 2021 · 7 comments
Closed
Labels
kind:feature Feature Requests

Comments

@david-woelfle
Copy link

Hi everyone,

thank you so much for your outstanding work on this project! While going through the docs I found the section on Running Airflow in Docker, where a docker-compose.yml file is provided. In the header of the this file it is stated:

WARNING: This configuration is for local development. Do not use it in a production deployment.

Could you explain why this is the case? I.e. why this setting should not be used in production and also what must be done/changed if someone would like to run an Airflow environment in production based on this configuration example?

@david-woelfle david-woelfle added the kind:feature Feature Requests label May 25, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented May 25, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@JavierLopezT
Copy link
Contributor

+1 to this explanation.

For the record, we are using docker-compose in production for almost 100 DAGs and our deployment is very very stable and good-performing.

@potiuk
Copy link
Member

potiuk commented May 25, 2021

Publishing something that is labelled as production ready is a lot of effort, and the maintenance effort required by the community to maintain it is much bigger. That's why we have to be very careful with labeling something as "production ready" when we officially publish it as community. We have to be prepared to support all kind of users, respond to their issues and fix them and possibly add new features continuously.

Just look how much time it took to graduate Helm chart (3 people worked full time for last ~ 2 months I believe to get it to the state where we could label it as "officially ready"). It took me few months to release first version of Production Docker Images and then over a year to iterate on it and update and respond to issues and handle many cases that were initially unforeseen but users raised them and we responded and added them. And only now I think we are close to make it the image released as "Official Docker image" #10107. Just a project to get it there https://github.com/apache/airflow/projects/3 has 35 issues in "Done" state and two more are needed to complete it.

Different users have different expectations, configurations, databases, executors, deployments, scalability requirements etc. etc. What works for you @JavierLopezT - might not work for 100 other users and they might have different expectations. Don't forget how opinionated you are in the way how you run YOUR deployment and how those opinions my be different for many other people.

When we label something as "production-ready" we should be ready to respond to such issues. We need to have automated tests covering regressions in case anything changes, we need to have formal release process ready, we need to be able to analyse and diagnose and fix problems when they arise.

The current Docker-compose is by far not production ready. It is more of a "quick-start" if you want to try Airflow. No more, no less. There are already a number of issues "The docker compose does not work with LocalExecutor" or "The docker compose does not work with MySQL" raised . And yes - it does not, and this is by design. It is not supposed to. It's not production-ready. This is not it's purpose.

There are other issues that are already created around that:

I think what @mik-laj proposed as having a wizard that generates docker-compose based on the expectations is a good start to go into "production-ready docker-compose" direction. But we have a loooong way to get there.

@potiuk potiuk closed this as completed May 25, 2021
@david-woelfle
Copy link
Author

Thank you @potiuk for the detailed answer! Now it makes much more sense to me.

@JavierLopezT
Copy link
Contributor

Crystal clear @potiuk It's OK for you if I open an MR with a summary of your words for including it in the docker-compose file?

@potiuk
Copy link
Member

potiuk commented May 25, 2021

Hmm i think it would have to be a general description rather than specific to docker-compose. We have a couple of those 'not production ready' things in Airflow code and i am not sure even where to put it.

But maybe we could indeed make some words about it in README or smth. I am not sure what others think about it. Anyone from the @apache/airflow-committers have an opinion ?

@mik-laj
Copy link
Member

mik-laj commented May 25, 2021

This file is also missing a few things that make this docker-compose unsafe for production:

  • CPU/memory resource limit- Each. the container has access to all system resources
  • SSL - The connection to the container should be encrypted by Traefik / other proxy, or we should configure SSL in the webserver ([webserver] web_server_ssl_*).
  • Containers use a local file system, but we should use volumes in a production environment.

We should also mention the possible ways of deploying DAGs, eg Git Sync.

I. I also recommend the last one. discussions on Slack, where I explained the assumptions of this guide.
https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1621801810231000?thread_ts=1621711385.211600&cid=CCQ7EGB1P

This example docker file has been prepared for the most popular configuration. I've only tested it with CeleryExecuttor. As for other executor configurations, I think that is beyond the scope of this article. The purpose of this article was to facilitate the launch of Airflow by someone who is unfamiliar with Airflow on its first run, so that they can test and check how Airflow works.
One way to do this is to limit all the actions that the user takes. Now, to start Airflow they just need to run two simple commands:

curl...
docker-compose up

You don't need to select a database engine, executor, or set other configuration options. You don't even need to know what it is to run Airflow.
We should prepare separate guides on how to configure Docker-compose in other configurations . I even started working on a tool that would allow us to generate several Docker-compose filesets based on user-supplied options, but I gave up work when Polidea was acquired by Snowflake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants