Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate resource requirements on extended resources with only request set #57170

Merged
merged 1 commit into from
Jan 6, 2018

Conversation

jiayingz
Copy link
Contributor

@jiayingz jiayingz commented Dec 14, 2017

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #57276

Special notes for your reviewer:

Release note:

Returns an error for non overcommitable resources if they don't have limit field set in container spec.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 14, 2017
@jiayingz
Copy link
Contributor Author

/sig-scheduling

@jiayingz
Copy link
Contributor Author

/release-note-non

@jiayingz
Copy link
Contributor Author

/assign @vishh @ConnorDoyle

@@ -167,6 +167,7 @@ var overcommitBlacklist = sets.NewString(string(core.ResourceNvidiaGPU))
func IsOvercommitAllowed(name core.ResourceName) bool {
return IsDefaultNamespaceResource(name) &&
!IsHugePageResourceName(name) &&
!IsExtendedResourceName(name) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate is technically correct but redundant since extended resources are by definition disjoint from default namespace resources. Probably we only need one of the two, but it would be helpful to add a comment saying why this was correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, comment makes more sense over redundant line of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed.

if quantity.Cmp(limitQuantity) != 0 && !helper.IsOvercommitAllowed(resourceName) {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be equal to %s limit", resourceName)))
} else if quantity.Cmp(limitQuantity) > 0 {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be less than or equal to %s limit", resourceName)))
}
} else if resourceName == core.ResourceNvidiaGPU {
} else !helper.IsOvercommitAllowed(resourceName) {
Copy link
Contributor

@ConnorDoyle ConnorDoyle Dec 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 👍 This was missed in the first extended resources pass.

However, this is a breaking API change right? I don’t think we can do this and maintain API compatibility with 1.8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorDoyle do you have any suggestions on how we should roll out this API change? Can we first update the extended resource doc to document this side effect for 1.8 and 1.9 and mention our plan on invalidating this kind of resource spec in 1.10, and then have the validation logic take effect in 1.10?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider not making this change.

The semantics of resource requests are: a minimum amount provided to a container. There's no indication implied that bursting is allowed for arbitrary resources, although we allow it for cpu and memory and that is documented. Also, we do already disallow limit ≠ request in validation.

Could you outline the specific user benefit of the validation change? Then we can weigh it against the cost.

@jiayingz jiayingz force-pushed the validation branch 2 times, most recently from 2da34de to d62b12e Compare December 15, 2017 18:48
@jiayingz
Copy link
Contributor Author

/assign @davidopp @bsalamat

Would like to get some opinions from the scheduling folks on this PR. In short, for extended resources introduced in 1.8, we clearly document that "Extended resources are only supported as integer resources and cannot be overcommitted" in its doc https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#extended-resources, but the current ValidateResourceRequirements check actually allows that only request field is set for extended resources and the current extended resource doc also implies that this kind of resource spec is allowed: "Extended resources cannot be overcommitted, so request and limit must be equal if both are present in a container spec." In this PR, I am thinking to change ValidateResourceRequirements logic to return an error for such resource spec to make it clear that extended resource doesn't allow overcommit, but this may break some existing pod specs that are formatted this way. Would like to get some opinions on whether we should leave the current validation logic as is and rely on the ER doc to communicate the fact that overcommit is not supported on ER, or we should consider to change the validation logic in a future release but document that plan now.

@ConnorDoyle
Copy link
Contributor

Copying hidden reply from above:

I think we should consider not making this change.

The semantics of resource requests are: a minimum amount provided to a container. There's no indication implied that bursting is allowed for arbitrary resources, although we allow it for cpu and memory and that is documented. Also, we do already disallow limit ≠ request in validation.

Could you outline the specific user benefit of the validation change? Then we can weigh it against the cost.

@bsalamat
Copy link
Member

@jiayingz Scheduler uses "Requests" for scheduling and does not control overcommitment. It is up to the node/kubelet to allow a pod to use more resources, including ER, than requested. If kubelet does not allow overcommitment of ER, that will probably be enough for preventing pods from using more resources than requested.
I think the change in this PR generally makes sense, but as you mentioned, is not backward compatible and could break existing pods. So, we may want to rely on existing mechanisms that do not allow pods to consume more of ERs than requested.

@vikaschoudhary16
Copy link
Contributor

I am unable to visualize a issue that we may hit if we skip this validation check. In CPU and memory case, it is required to put upper cap, before hand, using Limits because if we dont, any container may consume more than desired/disproportionate amount of these resource. On the contrary, with ER, Is it possible for a pod to consume more than allocated resources? I guess, NO.
Therefore, IMO, it should be safe to skip this change if i have not missed some use case.

@ConnorDoyle
Copy link
Contributor

Related issue: #57276

@jiayingz
Copy link
Contributor Author

Thanks a lot for the comments! With #57276 fixed, we can probably leave the validation part as is. Will leave this PR open for a few days and close it if no objection is raised.

if quantity.Cmp(limitQuantity) != 0 && !helper.IsOvercommitAllowed(resourceName) {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be equal to %s limit", resourceName)))
} else if quantity.Cmp(limitQuantity) > 0 {
allErrs = append(allErrs, field.Invalid(reqPath, quantity.String(), fmt.Sprintf("must be less than or equal to %s limit", resourceName)))
}
} else if resourceName == core.ResourceNvidiaGPU {
} else if !helper.IsOvercommitAllowed(resourceName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we some unit tests for this?

@vishh
Copy link
Contributor

vishh commented Dec 18, 2017

As @jiayingz mentioned, extended resources are not expected to allow for overcommit to begin with until we have better Resource APIs that allow for expressing overcommit capabilities offered at the node level. I can't think of a use case for overcommit at the cluster level.
We made an error in documentation and we should fix both documentation and code.
I suspect the user base for extended resources is relatively small atm and so I hope such a breaking change won't impact a lot of users. We can offer an opt-out flag if needed to ease with the transition (I doubt it will be necessary though).

@vishh
Copy link
Contributor

vishh commented Dec 18, 2017

And as far as device plugin goes, since it does not support overcommit, it should require request to equal limits if it considers requests for allocations.

@vikaschoudhary16
Copy link
Contributor

@vishh We tried to keep the API same and since in current behavior. Requests without Limits , though, is allowed, but IIUC, this is not the condition for overcommit. For overcommit, Limits must be greater than Requests and that is not allowed in current behavior. Therefore, in current code, overcommit is not allowed.
What you want is, please correct me if i am wrong:

  1. To keep this check from Jiyang's PR. This will ensure that limits are present.
  2. revert my change is manager.go to use Limits only, as it was before.
  3. Create a PR to fix documentation, which today says that limits are optional.

Thats fine too. Its just that we were not sure initially about changing the API behavior. Your point about a very small user base of ER, makes sense though. Please confirm if above three points is the way forward.

@vishh
Copy link
Contributor

vishh commented Dec 19, 2017

absence of limit typically means give me everything available on the node.
so today's behavior is enabling overcommit according to my understanding.
+1 on the plan proposed.

@ConnorDoyle
Copy link
Contributor

Apologies for jumping the gun on the other PR. I assumed that we wouldn’t tolerate an incompatible API change, but after all should have waited for discussion to settle before proceeding.

Given that @vishh prefers to make the API change, the new plan makes sense. The compelling case for me is a future scenario where we have a way to specify scheduling properties of each resource. This would make the overcommit-ibility of a resource distinguishable from the use site in container specs and less implicit.

@bsalamat
Copy link
Member

After reading @vishh's comments and given that backward compatibility is not a major concern (given the small number of users), this change LGTM.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 19, 2017
k8s-github-robot pushed a commit that referenced this pull request Dec 25, 2017
Automatic merge from submit-queue (batch tested with PRs 57591, 57369). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert back #57278

**What this PR does / why we need it**:
This PR reverts back to behavior of scanning Limits.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Related #
#57276
#57170
**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/sig node

/cc @vishh @ConnorDoyle @jiayingz
@jiayingz
Copy link
Contributor Author

/assign @liggitt

@ConnorDoyle
Copy link
Contributor

Would love to get this landed. I ran into #57276 myself again (!!!) since the other patch was reverted.

@jiayingz
Copy link
Contributor Author

jiayingz commented Jan 3, 2018

/assign @thockin

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 3, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 3, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 3, 2018
@liggitt
Copy link
Member

liggitt commented Jan 3, 2018

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 3, 2018
@bsalamat
Copy link
Member

bsalamat commented Jan 5, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 5, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, jiayingz, liggitt

Associated issue: #57276

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 57037, 57170). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 4bdf282 into kubernetes:master Jan 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Device manager only considers extended resource limits when allocating devices.