-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vpa: configurable MutatingWebhookConfig failurePolicy
#7180
Comments
/area vertical-pod-autoscaler |
I'm fine with this idea. It does allow for a foot gun though, where the VPA may fail on a VPA related pod. But these can be mitigated by using the feature to enable to ignore certain namespaces. If others agree, I'm happy to build this feature |
Ah, hmm, good point 😅 MutatingWebhookConfigs also support an FWIW, this specific issue wouldn't affect my particular use case, because I'm deploying VPA components in cluster A, but they're configured with a |
I recently added a way to ignore certain namespaces, see https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.2.0/vertical-pod-autoscaler/pkg/admission-controller/main.go#L80-L81 I was thinking adding documentation warning anyone that if they used |
Ah-ha. I didn't realize this was already added. An unvoiced concern from my previous message was that I didn't want to impose more work by adding such a flag. But it already exists. And so ...
... Perfect 👍 |
/assign |
There's a few things I'd like to clarify:
|
Right.
Potentially, no. Since vpa-a-c itself only really needs an API Server connection to read out
Depends who's scheduling the Pod. In the case of k8s-c-m ReplicaSet controlled Pods, there's retry with backoff I believe? (Plus the failures are noted as Events on the RS).
By default, absolutely not. So the current behaviour of |
Got it - thanks! Seems reasonable to introduce this as a "power" feature with a bunch of warnings. @voelzmo - curious what you think? |
Thanks @adrianmoisey 🍪 🍪 🍪 I'll keep an eye out for the next VPA release and look forward to making use of this. |
Seems like I missed this interesting option during my vacation. Just following up on this now. @samcday out of curiosity: what's your approach to the question from @raywainman above?
So Pods couldn't get scheduled because the webhook failed – what happens next in your case where the cluster wakes up from hibernation? |
@voelzmo in my specific case, this cannot happen. I have two distinct Kubernetes clusters, let's call them cluster Boop and cluster Meep. The k8s control plane for Meep is running on Boop. The only visible k8s That said, this issue can also be worked around in a single cluster by using the recently-introduced |
I can't help but wonder if maybe this is something that could be improved in apiserver. Specifically, it'd be cool if admission webhooks could be configured with a If I get some agreement from folks in this thread that such a thing sounds useful I'll probably be motivated enough to open an issue upstream ;) |
Which component are you using?: vertical-pod-autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
I am operating an autoscaling cloud cluster where the k8s control-plane and VPA components run in a parent cluster. I have configured the kube-scheduler for this cloud cluster with a
MostAllocated
scoring strategy, in order to densely pack my workloads into as few cloud nodes as possible.Because I'm lazyBecause memory is typically the resource under pressure for my use case, and because the memory usage of many workloads can often change depending on time of day and/or over the course of many days, I don't bother setting any resource memory requests. Instead I rely on VPA to determine and set this.The issue is, in the event that the VPA admission controller was not able to handle a request when a Pod was scheduled, a Pod with no (or incorrect) resource requests is scheduled. In extreme cases, for example if I've just done a cold-start of the cloud cluster (from 0 nodes to >0), something of a "thundering herd" of a dozen workloads that typically need 4GB of ram get scheduled on a single node (
MostAllocated
scheduling strat, remember) that has 8GB of RAM. Predictably tragic results follow.Describe the solution you'd like.:
I would like to be able to run vpa-admission-controller with
--webhook-failure-policy=Fail
, since I know that I have scheduled 2 vpa-a-c Pods with a PDB. So I would prefer that in the unlikely event no vpa-a-c is available, no pods are permitted to be scheduled in the dependent cloud cluster.Describe any alternative solutions you've considered.:
I could start managing the MutatingWebhookConfig myself and configure vpa-admission-controller with
--register-webhook=false
. (I was actually slightly surprised that the cowboysysop VPA chart was not already doing this... but, I digest)The text was updated successfully, but these errors were encountered: