Launch Multiple Worker Instances using CeleryExecutor #4542

btylerburton · 2023-11-29T21:56:27Z

User Story

In order to achieve true horizontal scaling, datagovteam wants to be able to launch multiple instances of the airflow-worker application and see them pick up queued work.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN I run cf push --vars-file my_vars_file
AND I have configured the datagov-harvester manifest to launch multiple instances of the worker application
WHEN I look at the "Cluster Activity" tab in Airflow UI
THEN I will see that it is queuing up new work and that the new queued work is getting picked up and run by a worker node

Background

Currently, launching more than one instance of the airflow-worker application causes the worker instances not to pick up work, whereas a single instance has no issues.

Considering the Celery documentation, this may be mitigated by launching the worker instances with a hostname:

You can start multiple workers on the same machine, but be sure to name each individual worker by specifying a node name with the --hostname argument:

Determine if we can launch the worker instances using the .profile and supply them with a unique start command, similar to airflow celery worker -n worker-{CF_INSTANCE_INDEX} using CF Env vars

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

Determine if we can launch the worker instances using the .profile and supply them with a unique start command
Deploy and see if that fixes the issue
Continue to iterate until multiple worker instances are supported

The text was updated successfully, but these errors were encountered:

btylerburton · 2023-12-06T17:06:23Z

Some learnings...

.profile runs before the manifest, and can be used to prep custom ENV VARS for use in the manifest
CF env vars are always available in the manifest with no manipulation in the profile
thus, it is possible to run this command airflow celery worker -n worker-{CF_INSTANCE_INDEX} with ease
however... this command does not map 1-to-1 with the celery command for adding a hostname, and there is an issue with binding to the same port for logs
this may be mitigated when/if we decide to enable remote logging to S3
and, it may be recommended to allow celery workers to auto-scale on a single machine vs scaling the instances based on this comment from an airflow core maintainer
so, our best path may to distribute DAG execution over the entire day to ensure that load remains steady and to tune the instance's memory as best we can using external monitoring.

btylerburton · 2023-12-12T16:44:46Z

Given what we've learned above, I'm going to close this ticket until we're focused on optimization.

btylerburton added the H2.0/orchestrator label Nov 29, 2023

btylerburton self-assigned this Dec 5, 2023

btylerburton closed this as completed Dec 12, 2023

btylerburton removed their assignment Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Launch Multiple Worker Instances using CeleryExecutor #4542

Launch Multiple Worker Instances using CeleryExecutor #4542

btylerburton commented Nov 29, 2023 •

edited

Loading

btylerburton commented Dec 6, 2023

btylerburton commented Dec 12, 2023

Launch Multiple Worker Instances using CeleryExecutor #4542

Launch Multiple Worker Instances using CeleryExecutor #4542

Comments

btylerburton commented Nov 29, 2023 • edited Loading

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

btylerburton commented Dec 6, 2023

btylerburton commented Dec 12, 2023

btylerburton commented Nov 29, 2023 •

edited

Loading