Estimators scales down deployments #6107

LavredisG · 2025-02-10T18:47:41Z

I have a setup of 2 kind clusters and I am working on a score plugin on Karmada's scheduler framework. My workload is a deployment which specifies CPU and Memory Requests and also needs 10 replicas. After scoring, we get a score of member1:100 and member2:25 so the replica distribution is expected to be member1:8 and member2:2 as indeed happens as seen below:

However, after a while distribution automatically changes to member1:1 replica, as seen here:

From the logs I suspected that it's caused by estimators, so I made sure to set --enable-scheduler-estimator=false on karmada-scheduler deployment to disable accurate estimator and also added the flag --enable-cluster-resource-modeling=false on karmada-controller-manager deployment (only on the controller and not on the agent as specified here since I have joined clusters using Push method, so I think it doesn't apply here). However, the logs are after setting the flags to false, so it somehow still happens. Do you have any idea what could cause this? Even if it's estimator's fault and clusters can fit 4 replicas each, why I only get 1 replica out 10?

Note that the problem doesn't arise when the deployment doesn't specify resource requests, since in that case only the cluster pod capacity is taken into consideration.

The text was updated successfully, but these errors were encountered:

zhzhuang-zju · 2025-02-11T01:36:40Z

Hello @LavredisG , apart from adding the new score plugin, have you made any other changes to the scheduler?

LavredisG · 2025-02-11T08:30:26Z

For simplicity, I have disabled every other in-tree plugin from scheduler's deployment.

zhzhuang-zju · 2025-02-12T01:33:44Z

For simplicity, I have disabled every other in-tree plugin from scheduler's deployment.

That is to say, only the filter plugin and score plugin of the scheduler have been modified, right?

Could you show me the propagationpolicy you're using?

LavredisG · 2025-02-12T15:34:41Z

Yep, that's right (to be exact, only score plugin has been implemented, no filter plugin, so all clusters pass the filter).

The propagation policy has a staticWeight policy between 2 clusters, each of them with weight:1:

However, after the score plugin runs, the weights are updated to 100 and 25 respectively, so the 10 replicas of the deployment are distributed as member1:8 and member2:2 before scaled down to 1.

zhzhuang-zju · 2025-02-17T02:19:20Z

However, after the score plugin runs the weights are updated to 100 and 25 respectively

Does your custom scoring plugin not only score the clusters but also change their weights?

LavredisG · 2025-02-17T11:45:24Z

Exactly, I forgot to mention it. Essentialy we care about the scores for the replica distribution, so the scores are fed back to the propagationPolicy as staticWeights for the replicas.

zhzhuang-zju · 2025-02-18T03:10:53Z

Exactly, I forgot to mention it. Essentialy we care about the scores for the replica distribution, so the scores are fed back to the propagationPolicy as staticWeights for the replicas.

This approach is a bit of a hack, and there may be other potential impacts. My idea is to investigate what happened between the selectcluster and assignreplics steps during the second scheduling process.

LavredisG · 2025-02-20T23:03:55Z

Exactly, I forgot to mention it. Essentialy we care about the scores for the replica distribution, so the scores are fed back to the propagationPolicy as staticWeights for the replicas.

This approach is a bit of a hack, and there may be other potential impacts. My idea is to investigate what happened between the selectcluster and assignreplics steps during the second scheduling process.

It is indeed a workaround, will look it further.

LavredisG added the kind/bug Categorizes issue or PR as related to a bug. label Feb 10, 2025

github-project-automation bot added this to Karmada Overall Backlog Feb 10, 2025

LavredisG changed the title ~~General-estimator scales down deployments~~ Estimators scales down deployments Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimators scales down deployments #6107

Estimators scales down deployments #6107

LavredisG commented Feb 10, 2025 •

edited

Loading

zhzhuang-zju commented Feb 11, 2025

LavredisG commented Feb 11, 2025

zhzhuang-zju commented Feb 12, 2025

LavredisG commented Feb 12, 2025 •

edited

Loading

zhzhuang-zju commented Feb 17, 2025

LavredisG commented Feb 17, 2025

zhzhuang-zju commented Feb 18, 2025

LavredisG commented Feb 20, 2025

Estimators scales down deployments #6107

Estimators scales down deployments #6107

Comments

LavredisG commented Feb 10, 2025 • edited Loading

zhzhuang-zju commented Feb 11, 2025

LavredisG commented Feb 11, 2025

zhzhuang-zju commented Feb 12, 2025

LavredisG commented Feb 12, 2025 • edited Loading

zhzhuang-zju commented Feb 17, 2025

LavredisG commented Feb 17, 2025

zhzhuang-zju commented Feb 18, 2025

LavredisG commented Feb 20, 2025

LavredisG commented Feb 10, 2025 •

edited

Loading

LavredisG commented Feb 12, 2025 •

edited

Loading