Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define Resource Models for a Cluster with Multiple Taints in Karmada #3869

Open
kangteng525 opened this issue Jul 31, 2023 · 6 comments
Labels
kind/question Indicates an issue that is a support question.

Comments

@kangteng525
Copy link

In a typical cluster, multiple taints can exist across different nodes. Here's an example scenario:

A cluster consisting of 5000 nodes where:

  • 1000 nodes have no taints
  • 1000 nodes have "taint1"
  • 1000 nodes have "taint2"
  • 1000 nodes have "taint3"
  • 1000 nodes have "taint4"

In this scenario, for any given taint (for example, "taint1"), the effective node count is 1000, not 5000, meaning the available resources are effectively 1/5 of the total.

Does Karmada support the definition of resource models in such scenarios where a cluster has multiple taints? If so, could you provide guidance on how we can define the spec for this scenario?

@kangteng525 kangteng525 added the kind/question Indicates an issue that is a support question. label Jul 31, 2023
@zishen
Copy link
Member

zishen commented Aug 1, 2023

You mean that each cluster has several distinct labels. And pod distribution depends on these labels, right?

@jwcesign
Copy link
Member

jwcesign commented Aug 1, 2023

Hi @kangteng525,
Let's consider an example to illustrate the point. We have two clusters, cluster1 and cluster2. Cluster1 has 100CPU based on a taint toleration (200CPU in reality), while cluster2 has 200CPU based on a taint toleration (200CPU in reality). Therefore, the replicas division ratio between these two clusters should be 1:2 instead of 1:1.

Do I understand correctly?

@kangteng525
Copy link
Author

Hi @jwcesign ,

Yes, you are correct. Cluster1 has 100CPU based on taint 1 and 100CPU based on taint 2, while cluster2 has 200CPU only on taint 1.

So if scheduling workloads with tolerance on taint 1, the ratio between these 2 clusters should be 1:2, and if scheduling workloads with tolerance on taint 2, it should be 1:0.

@jwcesign
Copy link
Member

jwcesign commented Aug 4, 2023

Hi @kangteng525,
When scheduling a workload with taints, you can install karmada-estimator to ensure that karmada-scheduler takes the taints into consideration. The estimator will calculate the resource ratio of multiple clusters while considering the taints, enabling: So if scheduling workloads with tolerance on taint 1, the ratio between these 2 clusters should be 1:2, and if scheduling workloads with tolerance on taint 2, it should be 1:0.

The related code is here:

func (es *AccurateSchedulerEstimatorServer) estimateReplicas(

@kangteng525
Copy link
Author

Hi @jwcesign ,

Thanks a lot!
So if installed karmada-estimator, karmada-scheduler will always call estimator before binding the nodes to the target cluster? And if estimator returns not enough replicas, then the cluster will not be chosen?

And one more question, if multiple propagation policies(for example A,B,C) running at once, it seems karamda-estimator will make snapShot before estimating, what if capacity changed after A and B bind to this cluster, and C is unable to bind although estimating is passed?

Thanks,
Kevin

@jwcesign
Copy link
Member

jwcesign commented Aug 16, 2023

Hi, @kangteng525

karmada-scheduler will always call estimator before binding the nodes to the target cluster?

Yes

And if estimator returns not enough replicas, then the cluster will not be chosen?

Yes

if multiple propagation policies(for example A,B,C) running at once, it seems karamda-estimator will make snapShot before estimating, what if capacity changed after A and B bind to this cluster, and C is unable to bind although estimating is passed?

Yes, it's possible, when some RB is scheduled and the workers are still not synced to member clusters, the scheduler may choose the same cluster(but actually the resource may not be enough after the worker is synced)

But we have application-failover, which could reschedule the pending workload to other clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants