Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support in-place resource update feature for VPA #5046

Closed
wangchen615 opened this issue Jul 25, 2022 · 10 comments
Closed

Support in-place resource update feature for VPA #5046

wangchen615 opened this issue Jul 25, 2022 · 10 comments
Assignees
Labels
area/vertical-pod-autoscaler kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature.

Comments

@wangchen615
Copy link
Contributor

Which component are you using?: Vertical Pod Autoscaler

Is your feature request designed to solve a problem? If so, describe the problem this feature should solve.:
In Kubernetes 1.25, the in-place resource update feature would be available in alpha. Any pods with this feature enabled would have a field to specify the restarting policies as the following. Pods controlled by VPA with a NoRestart policy should not be evicted by the VPA updater. Instead, it should be patched with the new recommended request directly.

apiVersion: v1
kind: Pod
metadata:
  name: 1pod
spec:
  containers:
  - name: busybox
    image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: "1"
        memory: "1Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    resizePolicy:
    - resourceName: cpu
      policy: NoRestart
    - resourceName: memory
      policy: NoRestart

Describe the solution you'd like.:
In the VPA updater, whenever existing logic decides to evict the pod, we should add a check on the pod spec to determine if NoRestart policy is enabled. If so, a patch request should be sent via updater directly to the pod without evicting it.

An example patch command looks like this:

		patchCommand := fmt.Sprintf(`{
			"spec":{
					"containers":[
						{
							"name":"containerName", 
							"resources":{"requests":"cpu":%s},"limits":{"cpu":%s}}
						}]
					}
			}`, resizeCPUValStr)

                 testPod, pErr := f.ClientSet.CoreV1().Pods(testPod.Namespace).Patch(context.TODO(),
			testPod.Name, types.StrategicMergePatchType, []byte(patchCommand), metav1.PatchOptions{})

Describe any alternative solutions you've considered.: None

Additional context.: The place where the VPA updater evicts the pod is here:

for vpa, livePods := range controlledPods {

@wangchen615 wangchen615 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 25, 2022
@wangchen615
Copy link
Contributor Author

/assign @jbartosik

@jbartosik jbartosik added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 25, 2022
@voelzmo
Copy link
Contributor

voelzmo commented Jul 27, 2022

Just to document a caveat regarding the capping of requests here, so we keep this in mind:

We probably need to move the limit-capping logic to a different place, so we don't have different behavior for the eviction and in-place update case.

EDIT: Sorry, turns out I didn't look properly for uses of Apply. The Updater seems to already do the right things during the priority calculations:

processedRecommendation, _, err := calc.recommendationProcessor.Apply(calc.vpa.Status.Recommendation, calc.vpa.Spec.ResourcePolicy, calc.vpa.Status.Conditions, pod)
so I think the podsForUpdate in the above loop do already have capped recommendations and we're fine!

@jbartosik
Copy link
Collaborator

I'm writing this down and I'll send a PR when it's ready.

I was thinking about something a little different:

  • Add actuation modes so we have Recreate (exists already), InPlace, InPlaceWithFallbackToEvict
  • we might want controls for when to fall back to deleting pods
  • we might want to limit how frequently we update in place

I heard that some time ago scaling down in place could cause problems, so we might want to allow scalign up in place but evicting to scale down.

I didn't think about per-resource configuration (yet).

@voelzmo
Copy link
Contributor

voelzmo commented Jul 28, 2022

Add actuation modes so we have Recreate (exists already), InPlace, InPlaceWithFallbackToEvict

Can you elaborate a bit on your thinking why we need additional actuation modes? In theory, these settings are already configured on the Pod itself with

resizePolicy:
    - resourceName: cpu
      policy: NoRestart
    - resourceName: memory
      policy: NoRestart

@jbartosik
Copy link
Collaborator

Exception for merging kubernetes/kubernetes#102884 was rejected so I think we will (unfortunately) have much time to discuss this.

@jbartosik
Copy link
Collaborator

@voelzmo

The main reason I think we should have a new enum value for the new behaviour is that some uses may want to configure their VPAs to keep that behavior (even if VPA changes what Auto means in the future). Similarly to how currently VPA has Auto and Recreate modes which currently do the same thing. But we're discussing changing what Auto does (and some workloads might want Recreate behavior, not the new behavior).

Longer version:

I think that there are at lest the following kinds of workloads when looking at in-place updates

  1. Want to be updated only in place or when something bad (for example OOM) is happening to them
  2. Want to be updated. Can be in-place, can be by eviction.
  3. Want to be updated by eviction or deletion, not in-place.
  4. Don't want to be actively updated.

For 4) we have modes Off and Initial.

When VPA deducing if it should apply in-place update works for distinguishing 2 and 3 (also there's Recreate mode). But I think we're going to need to add a new enum value to distinguish between 1) and 2), we can't deduce that from containers. For start we can support only 2) and not 1)

So I think we'll need enum values to distinguish those.

And since we're going to add enum values we might as well add them at the same time we add new VPA behaviors (so if someone doesn't want to worry that we'll change what Auto means a few years later they can choose enum value for which behavior shouldn't change).


I'm not sure if will merge in time for kubernetes/kubernetes#102884

The last update in the SIG-Node doc is from 2022-09-20 and it says that kubernetes/kubernetes#102884 can merge after:

  • next containerd release,
  • e2e tests are fully enabled and
  • cgroupv2 review issues have been addressed

From comment it looks like containerd release will happen.

@jbartosik
Copy link
Collaborator

One more reason to have this as a separate enum value is that since Auto is the default value it shouldn't be experimental / alpha feature.

@voelzmo
Copy link
Contributor

voelzmo commented Nov 3, 2022

Thanks @jbartosik for clarifying your thoughts on adding new actuation modes! I took me a while to understand the semantics of resizePolicy for a container correctly: The point is that users can mark in-place resizing for their specific workload as safe or unsafe, but clients can still opt for resizing with evicting. Adding InPlace as new mode makes sense to me.

I'm not sure if we also need InPlaceWithFallbackToEvict – is it necessary to have InPlace with the semantics that is doesn't do a fallback to eviction? Wouldn't this mean a user creates a VPA object putting the resource of a container under control of VPA, marking it as unsafe for in-place resizing and then using VPA in actuation mode InPlace without the fallback to eviction? In that case, they could have left out the resource which doesn't support in-place resizing entirely from VPA? Given that currently all users of VPA rely on eviction to update resources, this seems like an unnecessary addition to the list of actuation modes?

@jbartosik
Copy link
Collaborator

Sadly (source):

This feature missed the extended deadline after code freeze and therefore will not make it into 1.26.

@jbartosik
Copy link
Collaborator

#4016 was opened earlier for the same thing, let's have one issue for tracking this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants