Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch Multiple Worker Instances using CeleryExecutor #4542

Closed
4 tasks
btylerburton opened this issue Nov 29, 2023 · 2 comments
Closed
4 tasks

Launch Multiple Worker Instances using CeleryExecutor #4542

btylerburton opened this issue Nov 29, 2023 · 2 comments

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Nov 29, 2023

User Story

In order to achieve true horizontal scaling, datagovteam wants to be able to launch multiple instances of the airflow-worker application and see them pick up queued work.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN I run cf push --vars-file my_vars_file
    AND I have configured the datagov-harvester manifest to launch multiple instances of the worker application
    WHEN I look at the "Cluster Activity" tab in Airflow UI
    THEN I will see that it is queuing up new work and that the new queued work is getting picked up and run by a worker node

Background

Currently, launching more than one instance of the airflow-worker application causes the worker instances not to pick up work, whereas a single instance has no issues.

Considering the Celery documentation, this may be mitigated by launching the worker instances with a hostname:

You can start multiple workers on the same machine, but be sure to name each individual worker by specifying a node name with the --hostname argument:

Determine if we can launch the worker instances using the .profile and supply them with a unique start command, similar to airflow celery worker -n worker-{CF_INSTANCE_INDEX} using CF Env vars

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • Determine if we can launch the worker instances using the .profile and supply them with a unique start command
  • Deploy and see if that fixes the issue
  • Continue to iterate until multiple worker instances are supported
@btylerburton
Copy link
Contributor Author

Some learnings...

  • .profile runs before the manifest, and can be used to prep custom ENV VARS for use in the manifest
  • CF env vars are always available in the manifest with no manipulation in the profile
  • thus, it is possible to run this command airflow celery worker -n worker-{CF_INSTANCE_INDEX} with ease
  • however... this command does not map 1-to-1 with the celery command for adding a hostname, and there is an issue with binding to the same port for logs
  • this may be mitigated when/if we decide to enable remote logging to S3
  • and, it may be recommended to allow celery workers to auto-scale on a single machine vs scaling the instances based on this comment from an airflow core maintainer
  • so, our best path may to distribute DAG execution over the entire day to ensure that load remains steady and to tune the instance's memory as best we can using external monitoring.

@btylerburton
Copy link
Contributor Author

Given what we've learned above, I'm going to close this ticket until we're focused on optimization.

@btylerburton btylerburton removed their assignment Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🗄 Closed
Development

No branches or pull requests

1 participant