-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1714: Fair Sharing #1773
KEP-1714: Fair Sharing #1773
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/cc @KunWuLuan @tenzen-y |
@alculquicondor: GitHub didn't allow me to request PR reviews from the following users: KunWuLuan. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @kerthcet |
Yea, I should review this KEP. |
keps/1714-fair-sharing/README.md
Outdated
TeamE can submit as many workloads and consume as many resources as they can while | ||
TeamW is not a work and doesn’t need resources. However, once they arrive, some of | ||
the already submitted workloads from TeamE may be preempted(preferably the least | ||
important) to ensure equal extra space (irregardless of their given quota) for both teams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here to ensure equal extra space
, didn't understand the intension here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space from the company-wide pool, as described in the first paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass mostly with questions to better understand. For now I skipped the preemption part.
flavor, that are above the nominal quota. The value for a resource is the ratio of T_r and the | ||
total nominal quotas in the hierarchy of the parent of C. | ||
|
||
The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The value for the CQ or cohort" - is this the fair share value? This paragraph feels quite abstract, I think it would be helpful to back it up with a small example so that we can have some intuition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
keps/1714-fair-sharing/README.md
Outdated
TeamE can submit as many workloads and consume as many resources as they can while | ||
TeamW is not a work and doesn’t need resources. However, once they arrive, some of | ||
the already submitted workloads from TeamE may be preempted(preferably the least | ||
important) to ensure equal extra space (irregardless of their given quota) for both teams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space from the company-wide pool, as described in the first paragraph.
cad9a9b
to
6c1efbf
Compare
flavor, that are above the nominal quota. The value for a resource is the ratio of T_r and the | ||
total nominal quotas in the hierarchy of the parent of C. | ||
|
||
The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined. | ||
|
||
Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we set FairSharing configuration, what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's mentioned in the previous section and detailed in the sections below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my bad.
I wanted to say "When we set FairSharing configuration to ClusterQueue without Cohort, what happens?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing. Fair sharing only applies above CQs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meant that we will add any validation?
|
||
Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct: | ||
|
||
```go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can remember, we have a plan to have DRF as another queueingStarategy like this: https://docs.google.com/document/d/1VQ0qxWA-jwgvLq_WYG46OkXWW00O6q7b1BsR_Uv-acs/edit?usp=sharing
So, did you compare the pros and cons of the following options?
- Exptend CohortSpec and ClusterQueueSpec (current your approach)
- Introduce a new queueing strategy, "DRF", and have a new CRD, "FairSharing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRF within a CQ wouldn't easily extrapolate to more complex hierarchies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
Recording this discussion as an Alternative approach might be worth it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few notes to grasp the high level idea of the algorithms.
|
||
Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct: | ||
|
||
```go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRF within a CQ wouldn't easily extrapolate to more complex hierarchies.
|
||
The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined. | ||
|
||
Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's mentioned in the previous section and detailed in the sections below
/lgtm |
LGTM label has been added. Git tree hash: 7e88dcc1f51499d039498371bab9716694e2c9f7
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mwielgus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
Defines how fair sharing of unused resources will work in Kueue.
Which issue(s) this PR fixes:
Part of #1714
Special notes for your reviewer:
Does this PR introduce a user-facing change?