Support in-place resource update feature for VPA #5046

wangchen615 · 2022-07-25T14:33:00Z

Which component are you using?: Vertical Pod Autoscaler

Is your feature request designed to solve a problem? If so, describe the problem this feature should solve.:
In Kubernetes 1.25, the in-place resource update feature would be available in alpha. Any pods with this feature enabled would have a field to specify the restarting policies as the following. Pods controlled by VPA with a NoRestart policy should not be evicted by the VPA updater. Instead, it should be patched with the new recommended request directly.

apiVersion: v1
kind: Pod
metadata:
  name: 1pod
spec:
  containers:
  - name: busybox
    image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: "1"
        memory: "1Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    resizePolicy:
    - resourceName: cpu
      policy: NoRestart
    - resourceName: memory
      policy: NoRestart

Describe the solution you'd like.:
In the VPA updater, whenever existing logic decides to evict the pod, we should add a check on the pod spec to determine if NoRestart policy is enabled. If so, a patch request should be sent via updater directly to the pod without evicting it.

An example patch command looks like this:

		patchCommand := fmt.Sprintf(`{
			"spec":{
					"containers":[
						{
							"name":"containerName", 
							"resources":{"requests":"cpu":%s},"limits":{"cpu":%s}}
						}]
					}
			}`, resizeCPUValStr)

                 testPod, pErr := f.ClientSet.CoreV1().Pods(testPod.Namespace).Patch(context.TODO(),
			testPod.Name, types.StrategicMergePatchType, []byte(patchCommand), metav1.PatchOptions{})

Describe any alternative solutions you've considered.: None

Additional context.: The place where the VPA updater evicts the pod is here:

autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go

Line 196 in fc0666c

for vpa, livePods := range controlledPods {

The text was updated successfully, but these errors were encountered:

wangchen615 · 2022-07-25T14:34:09Z

/assign @jbartosik

voelzmo · 2022-07-27T08:01:46Z

Just to document a caveat regarding the capping of requests here, so we keep this in mind:

The Recommender only caps according to minAllowed and maxAllowed
The admission-controller is currently responsible for capping to the existing limit in case controlledValues: RequestsOnly is set or to an existing LimitRange in the namespace

~~We probably need to move the limit-capping logic to a different place, so we don't have different behavior for the eviction and in-place update case.~~

EDIT: Sorry, turns out I didn't look properly for uses of Apply. The Updater seems to already do the right things during the priority calculations:

autoscaler/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go

Line 82 in fc0666c

    
           processedRecommendation, _, err := calc.recommendationProcessor.Apply(calc.vpa.Status.Recommendation, calc.vpa.Spec.ResourcePolicy, calc.vpa.Status.Conditions, pod)

so I think the podsForUpdate in the above loop do already have capped recommendations and we're fine!

jbartosik · 2022-07-27T19:09:14Z

I'm writing this down and I'll send a PR when it's ready.

I was thinking about something a little different:

Add actuation modes so we have Recreate (exists already), InPlace, InPlaceWithFallbackToEvict
we might want controls for when to fall back to deleting pods
we might want to limit how frequently we update in place

I heard that some time ago scaling down in place could cause problems, so we might want to allow scalign up in place but evicting to scale down.

I didn't think about per-resource configuration (yet).

voelzmo · 2022-07-28T07:17:30Z

Add actuation modes so we have Recreate (exists already), InPlace, InPlaceWithFallbackToEvict

Can you elaborate a bit on your thinking why we need additional actuation modes? In theory, these settings are already configured on the Pod itself with

resizePolicy:
    - resourceName: cpu
      policy: NoRestart
    - resourceName: memory
      policy: NoRestart

jbartosik · 2022-08-02T16:04:14Z

Exception for merging kubernetes/kubernetes#102884 was rejected so I think we will (unfortunately) have much time to discuss this.

jbartosik · 2022-10-10T08:15:53Z

@voelzmo

The main reason I think we should have a new enum value for the new behaviour is that some uses may want to configure their VPAs to keep that behavior (even if VPA changes what Auto means in the future). Similarly to how currently VPA has Auto and Recreate modes which currently do the same thing. But we're discussing changing what Auto does (and some workloads might want Recreate behavior, not the new behavior).

Longer version:

I think that there are at lest the following kinds of workloads when looking at in-place updates

Want to be updated only in place or when something bad (for example OOM) is happening to them
Want to be updated. Can be in-place, can be by eviction.
Want to be updated by eviction or deletion, not in-place.
Don't want to be actively updated.

For 4) we have modes Off and Initial.

When VPA deducing if it should apply in-place update works for distinguishing 2 and 3 (also there's Recreate mode). But I think we're going to need to add a new enum value to distinguish between 1) and 2), we can't deduce that from containers. For start we can support only 2) and not 1)

So I think we'll need enum values to distinguish those.

And since we're going to add enum values we might as well add them at the same time we add new VPA behaviors (so if someone doesn't want to worry that we'll change what Auto means a few years later they can choose enum value for which behavior shouldn't change).

I'm not sure if will merge in time for kubernetes/kubernetes#102884

The last update in the SIG-Node doc is from 2022-09-20 and it says that kubernetes/kubernetes#102884 can merge after:

next containerd release,
e2e tests are fully enabled and
cgroupv2 review issues have been addressed

From comment it looks like containerd release will happen.

jbartosik · 2022-11-03T10:00:27Z

One more reason to have this as a separate enum value is that since Auto is the default value it shouldn't be experimental / alpha feature.

voelzmo · 2022-11-03T12:49:39Z

Thanks @jbartosik for clarifying your thoughts on adding new actuation modes! I took me a while to understand the semantics of resizePolicy for a container correctly: The point is that users can mark in-place resizing for their specific workload as safe or unsafe, but clients can still opt for resizing with evicting. Adding InPlace as new mode makes sense to me.

I'm not sure if we also need InPlaceWithFallbackToEvict – is it necessary to have InPlace with the semantics that is doesn't do a fallback to eviction? Wouldn't this mean a user creates a VPA object putting the resource of a container under control of VPA, marking it as unsafe for in-place resizing and then using VPA in actuation mode InPlace without the fallback to eviction? In that case, they could have left out the resource which doesn't support in-place resizing entirely from VPA? Given that currently all users of VPA rely on eviction to update resources, this seems like an unnecessary addition to the list of actuation modes?

jbartosik · 2022-11-28T09:33:00Z

Sadly (source):

This feature missed the extended deadline after code freeze and therefore will not make it into 1.26.

jbartosik · 2023-01-30T14:34:42Z

#4016 was opened earlier for the same thing, let's have one issue for tracking this

wangchen615 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 25, 2022

k8s-ci-robot assigned jbartosik Jul 25, 2022

jbartosik added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 25, 2022

jbartosik added the area/vertical-pod-autoscaler label Aug 2, 2022

wangchen615 mentioned this issue Aug 16, 2022

REQUEST: New membership for wangchen615 kubernetes/org#3616

Closed

9 tasks

jbartosik mentioned this issue Oct 25, 2022

Consider not marking Recreate / Auto VPA modes as experimental #5245

Closed

FengyunPan2 mentioned this issue Nov 22, 2022

[feature request] 垂直调整业务容器的资源 openkruise/kruise#1130

Closed

frivoire mentioned this issue Jan 5, 2023

Support in-place Pod vertical scaling in VPA #4016

Open

jbartosik closed this as completed Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support in-place resource update feature for VPA #5046

Support in-place resource update feature for VPA #5046

wangchen615 commented Jul 25, 2022

wangchen615 commented Jul 25, 2022

voelzmo commented Jul 27, 2022 •

edited

Loading

jbartosik commented Jul 27, 2022

voelzmo commented Jul 28, 2022

jbartosik commented Aug 2, 2022

jbartosik commented Oct 10, 2022

jbartosik commented Nov 3, 2022

voelzmo commented Nov 3, 2022

jbartosik commented Nov 28, 2022

jbartosik commented Jan 30, 2023

Support in-place resource update feature for VPA #5046

Support in-place resource update feature for VPA #5046

Comments

wangchen615 commented Jul 25, 2022

wangchen615 commented Jul 25, 2022

voelzmo commented Jul 27, 2022 • edited Loading

jbartosik commented Jul 27, 2022

voelzmo commented Jul 28, 2022

jbartosik commented Aug 2, 2022

jbartosik commented Oct 10, 2022

jbartosik commented Nov 3, 2022

voelzmo commented Nov 3, 2022

jbartosik commented Nov 28, 2022

jbartosik commented Jan 30, 2023

voelzmo commented Jul 27, 2022 •

edited

Loading