Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hugepages in downward API #86102

Merged
merged 1 commit into from
Nov 11, 2020

Conversation

derekwaynecarr
Copy link
Member

@derekwaynecarr derekwaynecarr commented Dec 10, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
Add the ability to project hugepages-<pagesize> container resource requirements via downward API

Fixes #85148

Special notes for your reviewer:
Adds requests.hugepages-<pagesize> and limits.hugepages-<pagesize> to downward API consistent with cpu, memory, and ephemeral storage.

Does this PR introduce a user-facing change?:

Add support for hugepages to downward API

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 10, 2019
@derekwaynecarr
Copy link
Member Author

/sig node

This works with local testing, but need to add more testing before merge.

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubectl area/kubelet area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 10, 2019
@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Dec 10, 2019
@liggitt
Copy link
Member

liggitt commented Dec 10, 2019

fyi - since this is loosening validation, we need to consider how to roll this out safely

@derekwaynecarr
Copy link
Member Author

@liggitt agree - assume we could handle similar to ephemeral storage?

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save you clicking on the link, verify failure is just gofmt :)

Overall, this diff looks to me to be heading in a good direction. I defer to other's on the proper level of testing needed (and also how to address the issue w/ loosening validation).

@@ -193,7 +194,28 @@ func ExtractContainerResourceValue(fs *v1.ResourceFieldSelector, container *v1.C
case "requests.ephemeral-storage":
return convertResourceEphemeralStorageToString(container.Resources.Requests.StorageEphemeral(), divisor)
}

// handle extended standard resources with dynamic names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly tangential question from me - it appears there are ~3 separate ExtractContainerResourceValue functions that we are updating here. Is there the opportunity to consolidate those into a single function which is reused? It feels less than optimal that we have to add the same 20 lines of code to three different functions.

Very possible I'm missing an important difference between them though :)

@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Dec 11, 2019
@odinuge
Copy link
Member

odinuge commented Feb 20, 2020

/cc

@k8s-ci-robot k8s-ci-robot requested a review from odinuge February 20, 2020 10:12
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 6, 2020
@derekwaynecarr derekwaynecarr force-pushed the downward-api branch 4 times, most recently from 0575a4e to 724ed4a Compare November 6, 2020 20:07
@derekwaynecarr
Copy link
Member Author

@liggitt ptal, let me know if this looks good to you.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plumbing looks correct. primary comment is about computing the validation options consistently in one place

allErrs = append(allErrs, field.NotSupported(fldPath.Child("resource"), fs.Resource, expressions.List()))
// check if the prefix is present
foundPrefix := false
for _, prefix := range prefixes.List() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can prefixes be nil here, causing a npe? if not, suggest not passing a pointer

validationErrorList := validation.ValidateReplicationController(newRc)
updateErrorList := validation.ValidateReplicationControllerUpdate(newRc, oldRc)
opts := validation.PodValidationOptions{AllowDownwardAPIHugePages: false}
oldFailsDownwardAPIValidation := len(validation.ValidateReplicationController(oldRc, opts)) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is saying any validation failure from any field in the old object means we failed downwardAPI validation, which isn't accurate...

I expected a helper like this (in the same package as DropDisabledTemplateFields, used above) that could be called consistently everywhere:

podOpts := pod.GetValidationOptionsFromTemplate(newRc.Spec.Template, oldRc.Spec.Template)
validationErrorList := validation.ValidateReplicationController(newRc, podOpts)
updateErrorList := validation.ValidateReplicationControllerUpdate(newRc, oldRc, podOpts)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and used from rcStrategy#Validate like this:

podOpts := pod.GetValidationOptionsFromTemplate(newRc.Spec.Template, nil)
return validation.ValidateReplicationController(controller, podOpts)

@@ -53,7 +56,13 @@ func (podTemplateStrategy) PrepareForCreate(ctx context.Context, obj runtime.Obj
// Validate validates a new pod template.
func (podTemplateStrategy) Validate(ctx context.Context, obj runtime.Object) field.ErrorList {
template := obj.(*api.PodTemplate)
return corevalidation.ValidatePodTemplate(template)
opts := validation.PodValidationOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unified helper to get opts, passing nil for old template (comment applies to all strategy computation of opts)

return corevalidation.ValidatePodTemplateUpdate(template, oldTemplate)

// Allow downward api usage of hugepages on pod update if feature is enabled or if the old pod already had used them.
opts := validation.PodValidationOptions{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unified helper to get opts (comment applies to all strategy computation of opts)

@@ -111,9 +113,14 @@ func (podStrategy) AllowCreateOnUpdate() bool {
func (podStrategy) ValidateUpdate(ctx context.Context, obj, old runtime.Object) field.ErrorList {
oldFailsSingleHugepagesValidation := len(validation.ValidatePodSingleHugePageResources(old.(*api.Pod), field.NewPath("spec"))) > 0
opts := validation.PodValidationOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unified helper to get opts

@@ -94,6 +94,8 @@ func (podStrategy) Validate(ctx context.Context, obj runtime.Object) field.Error
opts := validation.PodValidationOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use unified helper to get opts, passing nil for old pod

@@ -159,7 +159,7 @@ func TestCompatibility_v1_PodSecurityContext(t *testing.T) {
}

validator := func(obj runtime.Object) field.ErrorList {
return validation.ValidatePodSpec(&(obj.(*api.Pod).Spec), &(obj.(*api.Pod).ObjectMeta), field.NewPath("spec"))
return validation.ValidatePodSpec(&(obj.(*api.Pod).Spec), &(obj.(*api.Pod).ObjectMeta), field.NewPath("spec"), validation.PodValidationOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very slightly concerning, because we're not necessarily using the same options we do when validating actual user objects... is there a way we can construct the options using the unified helper from here? (the package deps might not line up right)

Comment on lines 1022 to 1025
localValidContainerResourceFieldPathPrefixes := sets.NewString()
if opts.AllowDownwardAPIHugePages {
localValidContainerResourceFieldPathPrefixes = localValidContainerResourceFieldPathPrefixes.Union(validContainerResourceFieldPathPrefixes)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, precompute the "with hugepages" and "without hugepages" sets as package vars and just select the appropriate one to use here... sets are expensive

Comment on lines 2193 to 2200
localValidContainerResourceFieldPathPrefixes := sets.NewString()
if opts.AllowDownwardAPIHugePages {
localValidContainerResourceFieldPathPrefixes = localValidContainerResourceFieldPathPrefixes.Union(validContainerResourceFieldPathPrefixes)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit about selecting a precomputed set rather than constructing one

@@ -2337,6 +2358,11 @@ func validateContainerResourceDivisor(rName string, divisor resource.Quantity, f
allErrs = append(allErrs, field.Invalid(fldPath.Child("divisor"), rName, "only divisor's values 1, 1k, 1M, 1G, 1T, 1P, 1E, 1Ki, 1Mi, 1Gi, 1Ti, 1Pi, 1Ei are supported with the local ephemeral storage resource"))
}
}
if strings.HasPrefix(rName, "requests.hugepages-") || strings.HasPrefix(rName, "limits.hugepages-") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these prefixes constants or can this prefix check be made a "is hugepages resource" method

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Nov 10, 2020
@derekwaynecarr derekwaynecarr force-pushed the downward-api branch 3 times, most recently from eef2361 to 2f1881e Compare November 10, 2020 17:12
@derekwaynecarr
Copy link
Member Author

@liggitt ptal i think i got the structure you wanted now.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's exactly what I was looking for.

Comment on lines 353 to 358
for _, container := range oldPodSpec.InitContainers {
opts.AllowDownwardAPIHugePages = opts.AllowDownwardAPIHugePages || usesHugePagesInProjectedEnv(container)
}
for _, container := range oldPodSpec.Containers {
opts.AllowDownwardAPIHugePages = opts.AllowDownwardAPIHugePages || usesHugePagesInProjectedEnv(container)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the visitor so we check all containers:

if !opts.AllowDownwardAPIHugePages {
  VisitContainers(oldPodSpec, AllContainers, func(container *api.Container, containerType ContainerType) 
(shouldContinue bool) {
    opts.AllowDownwardAPIHugePages = opts.AllowDownwardAPIHugePages || usesHugePagesInProjectedEnv(container)
    return !opts.AllowDownwardAPIHugePages
  })
}

@@ -182,7 +178,9 @@ type podEphemeralContainersStrategy struct {
var EphemeralContainersStrategy = podEphemeralContainersStrategy{Strategy}

func (podEphemeralContainersStrategy) ValidateUpdate(ctx context.Context, obj, old runtime.Object) field.ErrorList {
return validation.ValidatePodEphemeralContainersUpdate(obj.(*api.Pod), old.(*api.Pod))
// NOTE: ephemeral containers are not able to consume hugepages, so its safe to make this empty.
opts := validation.PodValidationOptions{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes these validation options will only be hugepages-related in the future... is there a downside to calling the opts helper here?

@derekwaynecarr
Copy link
Member Author

@liggitt had not seen visitor func, thanks for pointer. got requested updates complete. thanks!

@liggitt
Copy link
Member

liggitt commented Nov 10, 2020

change lgtm, CI is unhappy

@liggitt
Copy link
Member

liggitt commented Nov 10, 2020

/approve

will tag once CI issues get figured out

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2020
@derekwaynecarr
Copy link
Member Author

/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce-ubuntu-containerd
/test pull-kubernetes-e2e-gce-alpha-features

@derekwaynecarr
Copy link
Member Author

/test pull-kubernetes-e2e-gce-alpha-features

@liggitt
Copy link
Member

liggitt commented Nov 11, 2020

alpha test failures are due to #93873 (comment) and #92312 (comment)

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 11, 2020
@k8s-ci-robot
Copy link
Contributor

@derekwaynecarr: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-alpha-features 45bd6cb link /test pull-kubernetes-e2e-gce-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

downwardAPI volumes should also have hugepages-<size> exposed to the pod through resourceFieldRef.
9 participants