-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] v1 autoscaler consistency fix #40488
Conversation
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
@@ -188,15 +190,15 @@ void GcsResourceManager::HandleGetAllResourceUsage( | |||
rpc::GetAllResourceUsageRequest request, | |||
rpc::GetAllResourceUsageReply *reply, | |||
rpc::SendReplyCallback send_reply_callback) { | |||
if (!node_resource_usages_.empty()) { | |||
if (!gcs_autoscaler_state_manager_.GetNodeResourceInfo().empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just pull the GcsPlacementGroupManager
for the current pg load? Instead of relying on the periodic runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bigger refactor beyond the scope of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If GcsAutoscalerStateManager
exposes a GetPlacementGroupLoad
function (which just gets the load from the PG manager) for the below:
auto placement_group_load = gcs_placement_group_manager_.GetPlacementGroupLoad(); |
Then we could remove the dependency between GcsResourceManager and GcsPlacementGroupManager? So we don't need changes here: https://github.com/ray-project/ray/pull/40488/files#diff-f33835748a6d386dd44a3449d9299117799bd9b552655054f901e42c5fbb59a1R253-R262 ?
This also unifies v1/v2 for how PG load is populated with a potentially smaller change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The periodic function also schedules pending placement groups, it doesn't just update the info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we already removed the dependency in the current change
src/ray/gcs/gcs_server/gcs_server.cc
Outdated
gcs_resource_manager_->UpdatePlacementGroupLoad( | ||
gcs_placement_group_manager_->GetPlacementGroupLoad()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove placement_group_load in gcs_resource_manager since it will get all loads from gcs autoscaler manager now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also do that this way: #40254
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Is this a viable approach to resolving the issue? |
A less invasive way to do it.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.