vpa: configurable MutatingWebhookConfig `failurePolicy` #7180

samcday · 2024-08-17T16:40:47Z

Which component are you using?: vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

I am operating an autoscaling cloud cluster where the k8s control-plane and VPA components run in a parent cluster. I have configured the kube-scheduler for this cloud cluster with a MostAllocated scoring strategy, in order to densely pack my workloads into as few cloud nodes as possible.

~~Because I'm lazy~~ Because memory is typically the resource under pressure for my use case, and because the memory usage of many workloads can often change depending on time of day and/or over the course of many days, I don't bother setting any resource memory requests. Instead I rely on VPA to determine and set this.

The issue is, in the event that the VPA admission controller was not able to handle a request when a Pod was scheduled, a Pod with no (or incorrect) resource requests is scheduled. In extreme cases, for example if I've just done a cold-start of the cloud cluster (from 0 nodes to >0), something of a "thundering herd" of a dozen workloads that typically need 4GB of ram get scheduled on a single node (MostAllocated scheduling strat, remember) that has 8GB of RAM. Predictably tragic results follow.

Describe the solution you'd like.:

I would like to be able to run vpa-admission-controller with --webhook-failure-policy=Fail, since I know that I have scheduled 2 vpa-a-c Pods with a PDB. So I would prefer that in the unlikely event no vpa-a-c is available, no pods are permitted to be scheduled in the dependent cloud cluster.

Describe any alternative solutions you've considered.:

I could start managing the MutatingWebhookConfig myself and configure vpa-admission-controller with --register-webhook=false. (I was actually slightly surprised that the cowboysysop VPA chart was not already doing this... but, I digest)

The text was updated successfully, but these errors were encountered:

adrianmoisey · 2024-08-19T14:06:06Z

/area vertical-pod-autoscaler

adrianmoisey · 2024-08-19T14:09:53Z

I'm fine with this idea. It does allow for a foot gun though, where the VPA may fail on a VPA related pod. But these can be mitigated by using the feature to enable to ignore certain namespaces.

If others agree, I'm happy to build this feature

adrianmoisey · 2024-08-19T14:14:10Z

cc @voelzmo @kwiesmueller @raywainman

samcday · 2024-08-19T14:41:59Z

I'm fine with this idea. It does allow for a foot gun though, where the VPA may fail on a VPA related pod. But these can be mitigated by using the feature to enable to ignore certain namespaces.

Ah, hmm, good point 😅

MutatingWebhookConfigs also support an objectSelector which could be used to filter out the vpa-admission-controller pods. That would mean vpa-a-c pods need to know about their own labels though...

FWIW, this specific issue wouldn't affect my particular use case, because I'm deploying VPA components in cluster A, but they're configured with a --kubeconfig pointing at cluster B. So I think it might also be reasonable for an initial implementation to just ignore the problem and add a bit of documentation that warns of this pitfall.

adrianmoisey · 2024-08-19T14:47:37Z

MutatingWebhookConfigs also support an objectSelector which could be used to filter out the vpa-admission-controller pods. That would mean vpa-a-c pods need to know about their own labels though...

I recently added a way to ignore certain namespaces, see https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.2.0/vertical-pod-autoscaler/pkg/admission-controller/main.go#L80-L81

I was thinking adding documentation warning anyone that if they used --webhook-failure-policy=Fail, they need to be careful and possibly ignore the VPA namespace too, using --ignored-vpa-object-namespaces

samcday · 2024-08-19T14:50:37Z

I recently added a way to ignore certain namespaces, see https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.2.0/vertical-pod-autoscaler/pkg/admission-controller/main.go#L80-L81

Ah-ha. I didn't realize this was already added. An unvoiced concern from my previous message was that I didn't want to impose more work by adding such a flag. But it already exists. And so ...

I was thinking adding documentation warning anyone that if they used --webhook-failure-policy=Fail, they need to be careful and possibly ignore the VPA namespace too, using --ignored-vpa-object-namespaces

... Perfect 👍

adrianmoisey · 2024-08-19T18:44:16Z

/assign

raywainman · 2024-08-19T21:07:49Z

There's a few things I'd like to clarify:

Here we are concerned about the case when no vpa-admission-controller replicas are available, correct? In this case, we want all non-system pods (like vpa-admission-controller) to fail the webhook but still allow system pods to come up (effectively by bypassing the hook).
What is the expected behavior after the pods fail the webhook? Would they retry at some point or do they just sit in a bad state?
As a follow up to 2 - there are a variety of reasons the webhook can fail, for example when API Server is overloaded, do we really want to stop a pod from getting scheduled in that case?

samcday · 2024-08-19T21:22:10Z

Here we are concerned about the case when no vpa-admission-controller replicas are available, correct?

Right.

In this case, we want all non-system pods (like vpa-admission-controller) to fail the webhook but still allow system pods to come up (effectively by bypassing the hook).

Potentially, no. Since vpa-a-c itself only really needs an API Server connection to read out VPA .status.recommendations, you can make the case that you only care about filtering out the mutating webhook config for just this particular vpa-a-c pod(s). It's just that this is slightly tricky to do so the easiest way is to filter out a whole namespace (and just put the VPA components in there).

What is the expected behavior after the pods fail the webhook? Would they retry at some point or do they just sit in a bad state?

Depends who's scheduling the Pod. In the case of k8s-c-m ReplicaSet controlled Pods, there's retry with backoff I believe? (Plus the failures are noted as Events on the RS).

As a follow up to 2 - there are a variety of reasons the webhook can fail, for example when API Server is overloaded, do we really want to stop a pod from getting scheduled in that case?

By default, absolutely not. So the current behaviour of Ignore should be the default with this new flag. Only when operators are sure of their advanced use case should they reach for this flag and set it to Fail. One such use case is the example I outlined in the original issue description.

raywainman · 2024-08-20T14:01:08Z

Got it - thanks! Seems reasonable to introduce this as a "power" feature with a bunch of warnings.

@voelzmo - curious what you think?

samcday · 2024-08-23T06:43:29Z

Thanks @adrianmoisey 🍪 🍪 🍪 I'll keep an eye out for the next VPA release and look forward to making use of this.

voelzmo · 2024-08-30T11:59:22Z

Seems like I missed this interesting option during my vacation. Just following up on this now. @samcday out of curiosity: what's your approach to the question from @raywainman above?

What is the expected behavior after the pods fail the webhook? Would they retry at some point or do they just sit in a bad state?

So Pods couldn't get scheduled because the webhook failed – what happens next in your case where the cluster wakes up from hibernation?

samcday · 2024-08-30T12:32:39Z

@voelzmo in my specific case, this cannot happen. I have two distinct Kubernetes clusters, let's call them cluster Boop and cluster Meep. The k8s control plane for Meep is running on Boop. The only visible k8s Node objects in Meep are the cloud instances that are autoscaled (by cluster-autoscaler running on Boop). Similarly to the k8s-cp + cluster-autoscaler, I am running the VPA components for Meep on Boop. Thus, there's no "bootstrapping" or "chicken/egg" problem here.

That said, this issue can also be worked around in a single cluster by using the recently-introduced --ignored-vpa-object-namespaces option. In this setup, the VPA components live in their own namespace. let's call it vpa-system, and the admission webhook is configured with --ignored-vpa-object-namespaces=vpa-system. This guarantees that the webhook does not run for the pod(s) that power said webhook. This makes it feasible to use the Fail webhook policy to guarantee that pods cannot schedule without VPA recommendations applied, with the caveat/tradeoff that the VPA components themselves cannot have VPA recommendations at all.

samcday · 2024-08-30T12:41:23Z

I can't help but wonder if maybe this is something that could be improved in apiserver. Specifically, it'd be cool if admission webhooks could be configured with a Fail policy, but apiserver knows to relax this back to Ignore specifically for the pods that match the service selector the admission webhook was configured with. That would break the (admittedly somewhat niche) bootstrapping issue we've been discussing here.

If I get some agreement from folks in this thread that such a thing sounds useful I'll probably be motivated enough to open an issue upstream ;)

samcday added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 17, 2024

k8s-ci-robot added the area/vertical-pod-autoscaler label Aug 19, 2024

adrianmoisey mentioned this issue Aug 19, 2024

VPA: Add option to set the webhook failurePolicy to Fail #7185

Merged

k8s-ci-robot assigned adrianmoisey Aug 19, 2024

k8s-ci-robot closed this as completed in #7185 Aug 22, 2024

adrianmoisey mentioned this issue Sep 6, 2024

REQUEST: New membership for adrianmoisey kubernetes/org#5133

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vpa: configurable MutatingWebhookConfig `failurePolicy` #7180

vpa: configurable MutatingWebhookConfig `failurePolicy` #7180

samcday commented Aug 17, 2024 •

edited

Loading

adrianmoisey commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

samcday commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

samcday commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

raywainman commented Aug 19, 2024 •

edited

Loading

samcday commented Aug 19, 2024 •

edited

Loading

raywainman commented Aug 20, 2024

samcday commented Aug 23, 2024

voelzmo commented Aug 30, 2024

samcday commented Aug 30, 2024 •

edited

Loading

samcday commented Aug 30, 2024 •

edited

Loading

vpa: configurable MutatingWebhookConfig failurePolicy #7180

vpa: configurable MutatingWebhookConfig failurePolicy #7180

Comments

samcday commented Aug 17, 2024 • edited Loading

adrianmoisey commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

samcday commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

samcday commented Aug 19, 2024

adrianmoisey commented Aug 19, 2024

raywainman commented Aug 19, 2024 • edited Loading

samcday commented Aug 19, 2024 • edited Loading

raywainman commented Aug 20, 2024

samcday commented Aug 23, 2024

voelzmo commented Aug 30, 2024

samcday commented Aug 30, 2024 • edited Loading

samcday commented Aug 30, 2024 • edited Loading

vpa: configurable MutatingWebhookConfig `failurePolicy` #7180

vpa: configurable MutatingWebhookConfig `failurePolicy` #7180

samcday commented Aug 17, 2024 •

edited

Loading

raywainman commented Aug 19, 2024 •

edited

Loading

samcday commented Aug 19, 2024 •

edited

Loading

samcday commented Aug 30, 2024 •

edited

Loading

samcday commented Aug 30, 2024 •

edited

Loading