-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autoscaler] starts extra workers than necessary #36926
Comments
We seem to be seeing this behaviour too. Maybe slightly different. But essentially the autoscaler is starting more nodes than necessary. Initial request for 24x:
A few loops later it starts queuing unnecessary nodes.
And then some more
We don't understand why? The issue for us is they pend forever due to constraints around the pods and adding to resource quota counts which aren't valid. Attached logs: autoscaler.txt |
Seeing the same as well. No jobs running...
Then 5x 4 CPUs jobs started with 24 CPUs already available, 4 more instances launched unnecessarily
|
I'm also seeing this in a cluster I started on AWS EC2 instances with The config (I've put the whole thing at the end of this comment) includes multiple node types, two of which are Autoscaler log:
Full cluster config:
|
Just wondering if anyone has figured out what's happening here? Still happening to us. As we constrain nodes it's just leaving pending pods littered throughout the different clusters. |
as I said in the my original post, I believe there is a race condition on the reading of pending task and Usage: The |
Do we know when this is likely to come in as a fix? |
cc @rickyyx can you try and repro and see if this is fixed since ray27 release in the v2 oss autoscaler? |
2.7 should have patches that mitigate this - but this is essentially this #38189 as well. Current plan is to fix in 2.8. |
@vitsai > chasing the breadcrumbs > which is the most promising GH issue / PR that we think will resolve this issue? |
The fix for autoscaler v2 is in 2.8, the linked PR is for autoscaler v1. |
I will downgrade the priority as V1 fix is less prioritized. Does it sound okay? |
Reviewed with @rkooo567 @rynewang @vitsai > let's decide whether we should fix this in autoscaler v1. This is fixed in v2 but we have an interim state in ray29 where default may still be autosclaer v1 in which case this issue will be there. Next steps lets decide do we just skip/[ush to autoscaler v2 in ray210 or fix this regression. |
@vitsai @anyscalesam @rickyyx Could you please share more about how to use autoscaler v2 in ray29? Didn't find related doc |
Is it just enabling |
Hey @llidev we are still working on the fix with autoscaler v1. @vitsai has a PR here #40488: , while we try doing so, we are also working on v2 autoscaler. This has been delayed due to other priority, so it's not available with ray29 yet. |
Thank you @rickyyx . Do you think autoscaler_v2 is something the end user can try right now? Or it's still recommended to wait. |
It's still under active dev - so still not yet ready. |
We will close it after autoscaler v2 is enabled |
Any progress on this one? |
@DmitriGekhtman - not yet; v2 is still optional and not the default scaler for now. |
Hmm, looks like we're in a state where autoscaler v1 functionality is gradually degrading but autoscaler v2 development is suspended (last feature commit was in March). |
@dcarrion87 |
This is consistently reproducible on our infrastructure in the following way:
One would expect exactly 100 nodes to come up. The first time I tried this, I got 150 nodes -- that's quite severe over-provisioning. One way I can get around this, sort of: Given prioritization history for this issue, looks like it's unlikely to be resolved for autoscaler v1. On the other hand, looks like there's been a little bit of a pick-up in activity for autoscaler v2, so maybe there's some hope that stable Ray autoscaling will be available in the OSS in the not-too-distant future. |
I refined this by rejecting the autoscaler's upscaling attempts only up to a certain number of times consecutively; this way you can upscale eventually even with a stuck node. Summary: A potential workaround is to modify your node provider to upscale more slowly when nodes are pending. |
Or maybe, the right heuristic here is to backoff of upscaling if any node has recently transitioned from pending to running, to provide for time for the stray resource bundles to be removed from the pending list. |
#46588 minimal reproducible steps for overprovision |
What happened + What you expected to happen
Autoscaler started extra worker while they are not needed.
From the following log, I believe it may have race condition on reading
Our tasks use 1 CPU, 30G memory, the worker has 120GB memory
At beginning, autoscaler started 12 workers correctly. (4 tasks each worker and 48 tasks)
Then, the moment the worker started, it seems there is a race condition on the usage and pending tasks.
e.g.
Usage:
10.0/176.0 CPU # 10 task is runing
so, there should be 38 (48-10) tasks pending.
However, the autoscaler think there are still 48 task pending and start 3 extra workers for them
Versions / Dependencies
ray 2.5.1
Reproduction script
normal setup
Issue Severity
Low: It annoys or frustrates me.
The text was updated successfully, but these errors were encountered: