-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make prevention rule set configurable #4067
Comments
Let's make sure not to use environment variables as this will not scale over time, imo |
I don't get what do you want to achieve, is this something for the admission controller? |
Yes, if we add all these rules then end-users might want to control which ones should be on and which ones not. |
Are they configurable in the real life? I mean, CPU/Memory without requests won't work, multiple HPA scaling the same workload won't work either. But I get the idea, maybe there are other rules in the future that can be configured |
This indeed! But also, based on your argument above why are we making it an opt-in then? To give people the flexibility to choose how aggressive they want to be. Maybe a SO is scaling something that has an HPA on top of it as well because of a transition period; who knows. It's more about giving control over enforcement/enabling everything or nothing. |
This doesn't work. The HPA controller will conflict them, this scenario is worse than not having autoscaling because every HPA controller cycle will change the replicas twice if there are different values |
I know it does not work and it's an example but the point is that we allow end-users to mix and match rules. Today, the only option would be to enable all validation or none while there can be scenarios to turn off certain validation. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This admission hook gives me a lot of headache where we want to transition from HPA to KEDA for existing production workloads, and specifically preventing deployment when there is already another HPA. From docs:
It makes it very difficult to switch from vanilla HPA to KEDA-managed HPA. Consider this scenario for production: We have this configured in Helm chart: deployment:
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20 And the HPA currently set Now imagine wa want to transition to KEDA. We'd have Helm configured like this: useKeda: true The rendered templates will remove HPA and add KEDA With this hard validation migration is complex. We'd need to delete our HPA first. Then add KEDA. But what happens in this case? Since the HPA is gone, k8s is going to set The workaround is way too complex: need to set This can be easily avoided if we could temporarily disable this validation, and temporarily have both HPA and KEDA (with exactly same CPU profile) for a brief period of time. |
@ppanyukov as a workaround, you can disable (scale the webhook controller to 0) after the migration you can enable it again. |
And speaking of more trouble with admission webhook and Helm. I've hit another issue when we cannot uninstall any Helm releases which use KEDA for auto-scaler because the admission webhook errors out and finilisers don't complete and we are left with "hung" and broken deployments. Here is what I get from the logs:
See? And the reason this happens because Helm removes
|
Well, maybe. It's a good suggestion to be fair. But it's yet another "mysterious dependency" in a million-piece puzzle that we'd be grateful if we didn't have to think about. Any suggestions for the "Helm uninstall" workaround? Scaling admission hook to zero in this case if very easy to forget to do, and once Helm release gets into a botched state, it's not so easy to fix things. Although actually I wonder if the finalisers will actually complete if I disable the admission webhook post-uninstall? Need to test this. EDIT: it does look like the reconciler eventually runs a cycle and does clean up things when the Admission webhook is disabled after Helm uninstall. It's still not great. Should I report this "uninstall" as a separate issue? I would say this validation that deployment should exist is wrong as it depends on the order. And we should be able to deploy things in any order, first because we cannot always control it, and sometimes we may want to configure |
Be careful because this maybe won't happen as you expect, 2 HPAs (the old one and the new one generated by KEDA) can have unexpected behaviors, The HPA controller doesn't scale the value instantly in the first cycle AFAIK, so in the first cycle it will potentially scale in your workload, reducing your active instances. You can try this just adding another HPA and checking if this happens or not, but I'd ensure it
This is something unsupported at this moment, the only option for this is totally disabling the webhooks About the deletion itself, definitively the webhook shouldn't block the deletion but maybe there is any error or any unexpected behavior. I need to check it in depth because I have an idea about what can be happening |
I have reviewed the code and you can deploy ScaledObjects before any deployment if you aren't using cpu/memory scalers, if you are using them, you can't because the resources are required for validating them |
I think it's good that we have fixed the issue but the main point remains End-users should have an easy way to disable one/multiple/all rules. While scaling in the webhooks component/opting-out in Helm is a mitigation, I believe we should make the rules configurable. |
Yes @JorTurFer saw your PR, thanks for that, it will make our lives much easier :) Hopefully some consensus will be reached on other validation issues as well. |
Any thoughts on this @kedacore/keda-maintainers? I think having them configurable through a YAML is still valuable and trivial to do. |
@JorTurFer @zroubalik After giving another thought and discussion with @howang-ms - Why not introduce a CRD to have proper resource validation/etc rather than mounting a YAML? I think this would be a nice addition to have typed rules in a Allows us to also use that to offer to end-users for proper tooling by just doing a |
I think that it'd be a nice addition, maybe it's a bit overkill at this moment because we only have 3-5 checks, but if we want to extend it, it'd be nice. I'm curious about if there is any tool to generate scripts that we can inject there. I'm thinking in passing the check code using the CRD instead of having to code it inside the webhooks, that'd bring a total flexibility for extending the checks to platform providers |
This is to ease the usage for long-term indeed, but also we've recently had people that were blocked to migrate to KEDA because of HPA already being around. Although we have the feature now, they could have migrated faster if they could have disabled that one rule already. So scaling the rules it not one driver, being able to customize what is there today already is another one.
Maybe, but that is out of scope here |
@JorTurFer @zroubalik Any objection if we start making this configurable? |
No objections from my side |
@JorTurFer Do you have any? |
No, go ahead :) |
Proposal
Make prevention rule set configurable by giving end-users the nobs on a per-rule level so that they can manage what validation should be applied and what not.
This is something we should do by mounting a YAML configuration file with the various rules and provide a reference page with the available rules.
Use-Case
Give control to the end-user.
Anything else?
No response
The text was updated successfully, but these errors were encountered: