-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estimators scales down deployments #6107
Comments
Hello @LavredisG , apart from adding the new score plugin, have you made any other changes to the scheduler? |
For simplicity, I have disabled every other in-tree plugin from scheduler's deployment. |
That is to say, only the filter plugin and score plugin of the scheduler have been modified, right? Could you show me the |
Yep, that's right (to be exact, only score plugin has been implemented, no filter plugin, so all clusters pass the filter). The propagation policy has a staticWeight policy between 2 clusters, each of them with weight:1: However, after the score plugin runs, the weights are updated to 100 and 25 respectively, so the 10 replicas of the deployment are distributed as member1:8 and member2:2 before scaled down to 1. |
Does your custom scoring plugin not only score the clusters but also change their weights? |
Exactly, I forgot to mention it. Essentialy we care about the scores for the replica distribution, so the scores are fed back to the propagationPolicy as staticWeights for the replicas. |
This approach is a bit of a hack, and there may be other potential impacts. My idea is to investigate what happened between the |
It is indeed a workaround, will look it further. |
I have a setup of 2 kind clusters and I am working on a score plugin on Karmada's scheduler framework. My workload is a deployment which specifies CPU and Memory Requests and also needs 10 replicas. After scoring, we get a score of
member1:100
andmember2:25
so the replica distribution is expected to bemember1:8
andmember2:2
as indeed happens as seen below:However, after a while distribution automatically changes to member1:1 replica, as seen here:
From the logs I suspected that it's caused by estimators, so I made sure to set
--enable-scheduler-estimator=false
on karmada-scheduler deployment to disable accurate estimator and also added the flag--enable-cluster-resource-modeling=false
on karmada-controller-manager deployment (only on the controller and not on the agent as specified here since I have joined clusters using Push method, so I think it doesn't apply here). However, the logs are after setting the flags to false, so it somehow still happens. Do you have any idea what could cause this? Even if it's estimator's fault and clusters can fit 4 replicas each, why I only get 1 replica out 10?Note that the problem doesn't arise when the deployment doesn't specify resource requests, since in that case only the cluster pod capacity is taken into consideration.
The text was updated successfully, but these errors were encountered: