Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting capacityProviderStrategy not working in Push Work Pool #13030

Open
kaaloo opened this issue Sep 11, 2023 · 2 comments
Open

Setting capacityProviderStrategy not working in Push Work Pool #13030

kaaloo opened this issue Sep 11, 2023 · 2 comments

Comments

@kaaloo
Copy link

kaaloo commented Sep 11, 2023

Expectation / Proposal

I would like to be able to use a Push Work Pool on AWS that can scale to zero because of the cost of idle GPU instances. I have created the following template in the Advanced Tab of the Work Pool.

https://gist.github.com/kaaloo/de723421fb6fda6965fda3d3af5b6dc2

It attempts to replace the launchType section in the task run request with a capacityProviderStrategy section by setting up a capacity_provider variable which is then used as follows. The launch_type variable and launchType section in the template have been removed.

      "capacityProviderStrategy": [
        {
          "base": 0,
          "weight": 1,
          "capacityProvider": "{{ capacity_provider }}"
        }
      ]

Unfortunately, the current code base considers the launchType section to be mandatory and provides a default value of FARGATE which is added back in. This behavior is not compatible with what I'm trying to achieve.

https://repost.aws/questions/QUnKdakxvQROuUEjq2UWpa9g/should-ecs-ec2-asgprovider-capacity-provider-be-able-to-scale-up-from-zero-0-1

Traceback / Example

It seems to me that the issue stems somewhere around these lines of code:

https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L854C1-L854C1

However, I'm not familiar enough with the workings of the new Push Work Pools to get much further in resolving the issue myself. However, with some guidance I could give it a try. Especially if that can help get the fix out faster.

@jakekaplan
Copy link
Contributor

jakekaplan commented Sep 11, 2023

Hi @kaaloo thanks for filing the issue. You're right that right now ECS Work pool templates don't support capacityProviderStrategy. I'm happy to work with you if you'd like to contribute the feature!

Right now we ensure that there is a default launch type no matter what: https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L1233

We also validate other fields based on the launch type: https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L854C1-L854C1

It seems like we'll want to make sure launchType is not specified if capacityProviderStrategy and maybe some other relevant validation.

Push work pool are a closed source implementation, although they mimic a lot of the logic from the ECSWorker in this repo. If you're willing to contribute a fix I'm happy to port the implementation to push work pool version.

@JamiePlace
Copy link

any update on this? I see that @kaaloo had a PR in the prefect-aws repo but then the repo was set to read only and the PR closed. Obviously this is quite old at this point, but the costs are starting to ramp up!

Is there anyway to configure a work pool to simply not use the "launchType" parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants