Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache metric names provided by KEDA Metrics Server #2156

Closed
zroubalik opened this issue Oct 5, 2021 · 13 comments · Fixed by #2279
Closed

Cache metric names provided by KEDA Metrics Server #2156

zroubalik opened this issue Oct 5, 2021 · 13 comments · Fixed by #2279
Assignees
Labels
feature-request All issues for new features that have not been committed to needs-discussion

Comments

@zroubalik
Copy link
Member

zroubalik commented Oct 5, 2021

Proposal

Kubernetes API Server is asking for all available metrics that are provided by the KEDA Metrics Server. Internally we are doing a round trip across all ScaledObjects to compile the list of used metrics:

// ListAllExternalMetrics returns the supported external metrics for this provider
func (p *KedaProvider) ListAllExternalMetrics() []provider.ExternalMetricInfo {

This code has been on my radar for quite some time. We should cache the list of metrics in the metrics server and only update it if a ScaledObject has been added/updated/removed. But it is a little bit tricky because metrics server doesn't watch ScaledObjects (Operator does that in the reconcile loop) so it doesn't have the knowledge.

I have been thinking about this for some time and the best solution is probably one of these:

  1. Add a simple controller to Metrics Server, it will watch ScaledObjects and populate metrics names cache appropriately.
  2. Operator could maintain a list of all used metrics and update this list as ScaledObjects are being added/removed/updated. Then the Operator dumps this list of metrics into a ConfigMap which will be mounted to metrics server. Metrics Server will populate the metrics cache from this ConfigMap (will check the timestamp of the mounted CM in order to have up to date cache).

I am inclined toward solution 1.

Use-Case

It should improve perfomance, because this will reduce the number of calls to k8s API Server.

@zroubalik zroubalik added needs-discussion feature-request All issues for new features that have not been committed to labels Oct 5, 2021
@coderanger
Copy link
Contributor

coderanger commented Oct 5, 2021

+1 to the idea but I think the implementation suggested would be more brittle over time. I would suggest the better fix is we merge the controller and metrics server into one daemon.

@zroubalik
Copy link
Member Author

Merging controller and metrics server into one daemon could bring problems with potential HA setup of metrics server (some environments could have requirements for HA of k8s related services).

@coderanger
Copy link
Contributor

Controller-runtime supports non-leader-elected runnables just fine, so not sure why that would be a problem.

@zroubalik
Copy link
Member Author

zroubalik commented Oct 5, 2021

Let's say we merge controller + metrics server into one daemon and available metrics are populated from the controller. And we have HA KEDA setup with 2 deployments:

Kubernetes api server will ask KEDA Metrics server for available metrics -> the deployment with running controller will properly list and return them, but what if the request will be load balanced to the deployment with the controller that hasn't been selected as a leader? In this case we would have to manage the cache of metrics somehow.

@coderanger
Copy link
Contributor

Yes? But that can use the same underlying cache so we don't duplicate effort.

@zroubalik
Copy link
Member Author

We will have 2 Deployments (HA), where do you see the cache that's accessible from all KEDA Deployments?

@coderanger
Copy link
Contributor

It would be in-memory? I think we're talking about different things or something else to make these questions confusing on both sides :)

@arschles
Copy link
Contributor

arschles commented Oct 6, 2021

@coderanger could you go into some more detail why the original proposed solution would be brittle? I think it's worth talking about that because figuring out a way to have metrics reliably communicate with the operator would let us keep the two processes separate, and that would likely reduce the amount of work needed to get this done.

@zroubalik
Copy link
Member Author

@coderanger yeah probably a small misunderstanding from both sides :)

+1 to @arschles's ask and maybe if you can talk a little more about the in-memory proposal, that will allow HA (multiple K8s Deployments).

@yquansah
Copy link
Contributor

@coderanger I think having a unified cache accessible to all deployments will just add more architecture and overall complexity

@zroubalik how often do you forsee that the operator will dump contents into a ConfigMap? (with your proposed solution)

@zroubalik
Copy link
Member Author

@yquansah I am have been still thinking about the best option, currently I am more convinced that the best option would be to actually add a simple controller to Metrics Server, it will watch ScaledObjects and populate metrics names cache appropriately.

Wrt you question, it should do it with each change on ScaledObject.

@arschles
Copy link
Contributor

@zroubalik I believe that's a good solution as well, but we should think about failure modes if we do it. the big one on my mind right now is "drift" between metrics server and operator:

  • what happens if they don't have the same set of ScaledObjects?
  • can we / how do we ensure they converge on the updated set?
  • can we give operators tools to inspect & measure drift at a given point in time?

@zroubalik zroubalik self-assigned this Nov 3, 2021
@zroubalik
Copy link
Member Author

@arschles those are valid concerns, but I don't think that the list of metrics is that critical. It is just informative for k8s server, so in worst case it would ask for unknown metric, which we should handle in metric server.

@zroubalik zroubalik changed the title Cache metrics provided by KEDA Metrics Server Cache metric names provided by KEDA Metrics Server Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants