-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Metrics contd #146
Conversation
eb8963b
to
0d6ef39
Compare
Just noticed the binary was committed: https://github.com/DirectXMan12/k8s-prometheus-adapter/pull/146/files#diff-2fad0bac2a901c821b32c417b2ee77b8 was this by accident? |
cc @brancz |
Generally I think having full external metrics support would be nice, but I'd like us to first tackle the existing known scalability issues, that from our recent discussions would significantly change how the adapter works, including this functionality. What do you think @DirectXMan12 ? |
@brancz Thanks for your feedback! I've been away from this PR for a while now and, admittedly, have no idea which scalability issues you're referring to (I'm not currently following any of the topics in Slack or other areas very closely). So please take the following questions with that in mind.
I'm not attempting to apply any pressure at all, just curious about approximate timelines. My general rule-of-thumb on these kinds of things is to not "let perfect be the enemy of good." For example, if the scalability fixes are 2 months away, the time-to-external-metrics-adapter-rework is 2 months plus some additional lead time. If it's six months away, you get the picture. In the meantime, it seems like there's this external metrics API out there that not many people can actually use because they'd be required to roll their own adapter to Prometheus (or whatever metrics backend they're using). Now, I've not been involved with the K8S community enough to warrant my having an opinion on the matter, but it seems like (if the code is up-to-quality) it might be worthwhile to go ahead and enable external metrics scenarios. (If for no other reason than the code has already been written, and there's work to be done after the scalability issues are fixed anyway to bring this up-to-speed.) Again, no negative tone intended. Just trying to get an understanding of things so I can make the proper moves on our end. |
Also preventing an unessecary fork, as well as long as configuration doesn't change, or change much. We can perform these changes in the background. |
Working on all of this is next on our list. The biggest problem I would have with merging this is that the configuration aspect is likely to change dramatically with the planned changes. If if everyone is ok with that, and we list it in the release notes, I'm ok with moving forward with this. What do you think @DirectXMan12 @s-urbaniak @metalmatze ? |
First, thanks a lot @john-delivuk for reviving these PRs! I am 👍 on the comments from @brancz. Having said that we could make it clear in the readme that not only big (probably backwards incompatible?) configuration changes are to be expected not only for external (this contribution) but also for the existing custom metrics support. Regarding the further refactoring we'd have to take external metrics into account anyways. Not having existing code would tempt to "we'll do it later". That way we have some natural pressure to implement external metrics for the future factorings as well. I'd suggest to do another review sweep throughout the code before merging but it seems to be a good and long awaited addition. Obviously I am leaving the final word to @DirectXMan12 :-) |
I'm fine with the approach of "merge now, disclaimer that it will change later" (don't forget to follow semver ;-) ) |
Code-changes-wise, I believe I've reviewed a previous version of this, and I trust the other reviewers here (@brancz @s-urbaniak @metalmatze), so I'm fine moving forward with this (don't forget to squash commits into logical chunks, though). |
Sounds and looks good from my side. In that case @john-delivuk could you please do as @DirectXMan12 said and squash these changes into logical chunks, then this is good to go for me 🙂 . |
a4140c2
to
2bf8304
Compare
F*** me. I just rushed the rebase, I'll clean this up tonight when I get home and have time. |
2bf8304
to
0c50949
Compare
@brancz Let me know if you need anything else! |
pkg/external-provider/errors.go
Outdated
// NewOperatorNotSupportedByPrometheusError creates an error that represents the fact that we were requested to service a query that | ||
// Prometheus would be unable to support. | ||
func NewOperatorNotSupportedByPrometheusError() error { | ||
return errors.New("operator not supported by prometheus") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these should be vars in the style of:
var (
ErrOperatorNotSupportedByPrometheus = errors.New("operator not supported by prometheus")
...
)
|
||
// overridableSeriesRegistry is a basic SeriesRegistry | ||
type externalSeriesRegistry struct { | ||
mu sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you document what fields are protected by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferred way to document this? Inline comment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, a small comment is totall sufficient 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoinne85 or @DirectXMan12
From what I can tell we lock the metrics series when we read to prevent read / write issues. Can you just confirm so I don't mislead :)
converters := result.converters | ||
|
||
if len(newSeriesSlices) != len(converters) { | ||
glog.Errorf("need one set of series per converter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic a few lines below if len(newSeriesSlices) > len(converters)
. We should return with an error from this function instead.
// concurrently. Returned group-resources are "normalized" as per the | ||
// MetricInfo#Normalized method. Group-resources passed as arguments must | ||
// themselves be normalized. | ||
type MetricNamer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates most of the code from custom-provider/metric_namer.go
. Is it possible to merge those two implementations? I diff'ed locally and there seems to be some differences, albeit not many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So logically I feel like the metric namer should move to naming
. Opinions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming
sounds like a good choice 👍
Thanks for the quick review. Will get to this today I hope. This PR has been open so long it's fallen behind. x.x |
@john-delivuk thank you so much for the effort! 👍 |
@john-delivuk another thing, it seems you are vendoring
Do you mind to vendor it in a separate commit? |
Edited Makefile to add cross build support for s390x. Adding External Metrics Provider
We're very interested in seeing this merged and released, so let me know if we can do anything to help. (Like testing) |
If anyone is interested in playing with this, I've been pushing builds to my personal dockerhub. https://hub.docker.com/r/jdelivuk/k8s-prometheus-adapter-amd64 Use at your own risk |
I actually have it working. I have yet to create autoscaling to use the rules. Although, I am not using the grouping functionality mentioned above. |
The only thing I am having a hard time to get working is having external metrics for a specific namespace. My data does not include kubernetes-related information, so I need to add the namespace in, somehow. Is there something the code in this PR can do to help in this regard, or will it only be available in the default namespace? |
I ended up using I can now get metrics out using the raw API like this:
But it doesn't seem like HorizontalPodAutoscaler can access it:
Am I missing something obvious? |
Just an update. I found the issue and patched it.... while along the way hating how half of naming was being handled by |
Quick design question. For
i'm preferential to 2. Let me know if I'm also missing something, or you would like a different approach. |
@Multiply I'll look into your mentioned issue after these pending changes. I've refactored a lot of this, and will test soon to see if this is still an issue. |
Ok so this change looks big and scary but here is the gist.
I threw up an image on my docker hub mentioned, feel free to try it out. This build feels like it could be the one :) |
this is looking great @john-delivuk 👍 thanks so much for putting the effort into this! I will test this iteration out today, if goes well this is a big LGTM from my side :-) |
@john-delivuk What is the correct raw path to query for external metrics? |
To view available metrics
|
@john-delivuk PR #179 has just been merged. To make it easier for you, I created the following commit which you can cherry-pick: s-urbaniak@bbac2bc |
This is a merge commit back-porting PR kubernetes-sigs#179
…er into em-isolation
Completed @s-urbaniak |
Is it rebase time? |
@john-delivuk testing now on my cluster, once successful, we're definitely ready to merge 👍 |
@john-delivuk awesome work, i just did a sanity test and am getting external metrics 🎉
|
just for completeness, I also verified both custom metrics and resource metrics:
|
@s-urbaniak I am trying to use the prometheus adapter helm charts (https://github.com/helm/charts/tree/master/stable/prometheus-adapter) but I'm assuming there are adjustments that need to be made for v0.5.0 since the current version set in the values.yaml file is v0.4.1. Do you know if anyone is working on updating these for v0.5.0 or if there are particular things I could do manually to make them work? Sorry if this is the wrong place to ask. |
@zacharysierakowski I am not aware of any work done for helm. All changes in 0.5.0 should be backwards compatible with 0.4.1. What 0.5.0 introduces, is a new configuration section |
Thanks for such a quick reply! Are there docs on the |
@zacharysierakowski external metrics (as well as custom metrics) are something you have to configure yourself. If the config is missing, that api won't be enabled. I made a configuration sample here #146 (comment). As mentioned above, I'll create additional documentation. For the sake of this already long PR, please open a new issue if you have additional questions :-) |
@s-urbaniak Okay great. Thank you very much for you help! |
I was hesitant to start extracting some of the duplicated components. Let me know if you would like this refined further or we can continue to clean up in future work.
Thanks to @dangoodman5425 and @antoinne85 for everything they've contributed.