Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Metrics contd #146

Merged
merged 12 commits into from
Apr 3, 2019
Merged

Conversation

john-delivuk
Copy link
Contributor

I was hesitant to start extracting some of the duplicated components. Let me know if you would like this refined further or we can continue to clean up in future work.

Thanks to @dangoodman5425 and @antoinne85 for everything they've contributed.

@john-delivuk john-delivuk force-pushed the em-isolation branch 4 times, most recently from eb8963b to 0d6ef39 Compare December 10, 2018 04:06
@pvdvreede
Copy link

Just noticed the binary was committed: https://github.com/DirectXMan12/k8s-prometheus-adapter/pull/146/files#diff-2fad0bac2a901c821b32c417b2ee77b8 was this by accident?

@DirectXMan12
Copy link
Contributor

cc @brancz

@brancz
Copy link
Contributor

brancz commented Jan 7, 2019

Generally I think having full external metrics support would be nice, but I'd like us to first tackle the existing known scalability issues, that from our recent discussions would significantly change how the adapter works, including this functionality. What do you think @DirectXMan12 ?

@antoinne85
Copy link

@brancz Thanks for your feedback!

I've been away from this PR for a while now and, admittedly, have no idea which scalability issues you're referring to (I'm not currently following any of the topics in Slack or other areas very closely). So please take the following questions with that in mind.

  • Are you referring to scalability issues within the boilerplate or with the way external metrics are implemented in K8S?
  • If you're referring to issues within K8S, specifically, are there any rough estimates floating around about when they might get addressed?

I'm not attempting to apply any pressure at all, just curious about approximate timelines. My general rule-of-thumb on these kinds of things is to not "let perfect be the enemy of good." For example, if the scalability fixes are 2 months away, the time-to-external-metrics-adapter-rework is 2 months plus some additional lead time. If it's six months away, you get the picture.

In the meantime, it seems like there's this external metrics API out there that not many people can actually use because they'd be required to roll their own adapter to Prometheus (or whatever metrics backend they're using).

Now, I've not been involved with the K8S community enough to warrant my having an opinion on the matter, but it seems like (if the code is up-to-quality) it might be worthwhile to go ahead and enable external metrics scenarios. (If for no other reason than the code has already been written, and there's work to be done after the scalability issues are fixed anyway to bring this up-to-speed.)

Again, no negative tone intended. Just trying to get an understanding of things so I can make the proper moves on our end.

@john-delivuk
Copy link
Contributor Author

Also preventing an unessecary fork, as well as long as configuration doesn't change, or change much. We can perform these changes in the background.

@brancz
Copy link
Contributor

brancz commented Jan 15, 2019

Working on all of this is next on our list. The biggest problem I would have with merging this is that the configuration aspect is likely to change dramatically with the planned changes. If if everyone is ok with that, and we list it in the release notes, I'm ok with moving forward with this. What do you think @DirectXMan12 @s-urbaniak @metalmatze ?

@s-urbaniak
Copy link
Contributor

First, thanks a lot @john-delivuk for reviving these PRs!

I am 👍 on the comments from @brancz. Having said that we could make it clear in the readme that not only big (probably backwards incompatible?) configuration changes are to be expected not only for external (this contribution) but also for the existing custom metrics support.

Regarding the further refactoring we'd have to take external metrics into account anyways. Not having existing code would tempt to "we'll do it later". That way we have some natural pressure to implement external metrics for the future factorings as well.

I'd suggest to do another review sweep throughout the code before merging but it seems to be a good and long awaited addition.

Obviously I am leaving the final word to @DirectXMan12 :-)

@DirectXMan12
Copy link
Contributor

I'm fine with the approach of "merge now, disclaimer that it will change later" (don't forget to follow semver ;-) )

@DirectXMan12
Copy link
Contributor

Code-changes-wise, I believe I've reviewed a previous version of this, and I trust the other reviewers here (@brancz @s-urbaniak @metalmatze), so I'm fine moving forward with this (don't forget to squash commits into logical chunks, though).

@brancz
Copy link
Contributor

brancz commented Jan 30, 2019

Sounds and looks good from my side. In that case @john-delivuk could you please do as @DirectXMan12 said and squash these changes into logical chunks, then this is good to go for me 🙂 .

@john-delivuk
Copy link
Contributor Author

F*** me. I just rushed the rebase, I'll clean this up tonight when I get home and have time.

@john-delivuk
Copy link
Contributor Author

@brancz Let me know if you need anything else!

// NewOperatorNotSupportedByPrometheusError creates an error that represents the fact that we were requested to service a query that
// Prometheus would be unable to support.
func NewOperatorNotSupportedByPrometheusError() error {
return errors.New("operator not supported by prometheus")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these should be vars in the style of:

var (
  ErrOperatorNotSupportedByPrometheus = errors.New("operator not supported by prometheus")
...
)


// overridableSeriesRegistry is a basic SeriesRegistry
type externalSeriesRegistry struct {
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you document what fields are protected by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred way to document this? Inline comment ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a small comment is totall sufficient 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoinne85 or @DirectXMan12
From what I can tell we lock the metrics series when we read to prevent read / write issues. Can you just confirm so I don't mislead :)

converters := result.converters

if len(newSeriesSlices) != len(converters) {
glog.Errorf("need one set of series per converter")
Copy link
Contributor

@s-urbaniak s-urbaniak Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic a few lines below if len(newSeriesSlices) > len(converters). We should return with an error from this function instead.

// concurrently. Returned group-resources are "normalized" as per the
// MetricInfo#Normalized method. Group-resources passed as arguments must
// themselves be normalized.
type MetricNamer interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates most of the code from custom-provider/metric_namer.go. Is it possible to merge those two implementations? I diff'ed locally and there seems to be some differences, albeit not many.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So logically I feel like the metric namer should move to naming. Opinions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming sounds like a good choice 👍

@john-delivuk
Copy link
Contributor Author

Thanks for the quick review. Will get to this today I hope. This PR has been open so long it's fallen behind. x.x

@s-urbaniak
Copy link
Contributor

@john-delivuk thank you so much for the effort! 👍

@s-urbaniak
Copy link
Contributor

@john-delivuk another thing, it seems you are vendoring testify, as Travis fails with:


# github.com/directxman12/k8s-prometheus-adapter/pkg/external-provider
pkg/external-provider/periodic_metric_lister_test.go:8:2: cannot find package "github.com/stretchr/testify/require" in any of:
...

Do you mind to vendor it in a separate commit?

DirectXMan12 and others added 2 commits February 10, 2019 15:59
Edited Makefile to add cross build support for s390x.

Adding External Metrics Provider
@Multiply
Copy link
Contributor

We're very interested in seeing this merged and released, so let me know if we can do anything to help. (Like testing)

@john-delivuk
Copy link
Contributor Author

john-delivuk commented Mar 19, 2019

If anyone is interested in playing with this, I've been pushing builds to my personal dockerhub.

https://hub.docker.com/r/jdelivuk/k8s-prometheus-adapter-amd64

Use at your own risk

@Multiply
Copy link
Contributor

I actually have it working. I have yet to create autoscaling to use the rules. Although, I am not using the grouping functionality mentioned above.

@Multiply
Copy link
Contributor

The only thing I am having a hard time to get working is having external metrics for a specific namespace. My data does not include kubernetes-related information, so I need to add the namespace in, somehow.
I saw #167, which seems to request such a feature.

Is there something the code in this PR can do to help in this regard, or will it only be available in the default namespace?

@Multiply
Copy link
Contributor

I ended up using metric_relabel_configs to add the correct namespace to prometheus for now.

I can now get metrics out using the raw API like this:

$ kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/production/queue_messages?labelSelector=queue=default,vhost=root'
{"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/external.metrics.k8s.io/v1beta1/namespaces/production/queue_messages"},"items":[{"metricName":"","metricLabels":{},"timestamp":"2019-03-20T14:51:25Z","value":"800"}]}

But it doesn't seem like HorizontalPodAutoscaler can access it:

E0320 14:49:08.536433       1 horizontal.go:189] failed to compute desired number of replicas based on listed metrics for Deployment/production/queue-default: failed to get queue_messages external metric: unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"
I0320 14:49:08.536486       1 event.go:218] Event(v1.ObjectReference{Kind:"HorizontalPodAutoscaler", Namespace:"production", Name:"queue-default", UID:"081f4772-4b1a-11e9-bd8f-02a321a654d2", APIVersion:"autoscaling/v2beta1", ResourceVersion:"32927459", FieldPath:""}): type: 'Warning' reason: 'FailedGetExternalMetric' unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"
I0320 14:49:08.536512       1 event.go:218] Event(v1.ObjectReference{Kind:"HorizontalPodAutoscaler", Namespace:"production", Name:"queue-default", UID:"081f4772-4b1a-11e9-bd8f-02a321a654d2", APIVersion:"autoscaling/v2beta1", ResourceVersion:"32927459", FieldPath:""}): type: 'Warning' reason: 'FailedComputeMetricsReplicas' failed to get queue_messages external metric: unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"

Am I missing something obvious?

@john-delivuk
Copy link
Contributor Author

Just an update. I found the issue and patched it.... while along the way hating how half of naming was being handled by pkg/naming, and the other half was in pkg/external-metrics so pretty decent rewrite coming up. I do want to cycle through some tests tomorrow as I think there are a few edge cases I don't think are being handled correctly

@john-delivuk
Copy link
Contributor Author

john-delivuk commented Mar 25, 2019

Quick design question. For .DiscoveryRule.resource.overrides this is not really needed for external metrics. I see a few options,

  1. Split the DiscoveryRule into 2 different types
  2. Document that iit's unused, have pkg/naming just ignore that if specified.
  3. Have it follow suit.

i'm preferential to 2. Let me know if I'm also missing something, or you would like a different approach.

@john-delivuk
Copy link
Contributor Author

@Multiply I'll look into your mentioned issue after these pending changes. I've refactored a lot of this, and will test soon to see if this is still an issue.

@john-delivuk
Copy link
Contributor Author

john-delivuk commented Mar 26, 2019

Ok so this change looks big and scary but here is the gist.

  1. Remove the functionality duplicated by pkg/naming
  2. I moved in the query builder to the appropriate sounding place for external queries
  3. Removed the namespace section of the query builder. After looking at the provider.ExternalMetricInfo it is currently ignorant to the namespace, so I feel like our queries should be as well. If someone would like to specify a namespace they should use a label selector which would translate. As a result, .resources.overrides are ignored for any external metric.

I threw up an image on my docker hub mentioned, feel free to try it out. This build feels like it could be the one :)

@s-urbaniak
Copy link
Contributor

s-urbaniak commented Mar 27, 2019

this is looking great @john-delivuk 👍 thanks so much for putting the effort into this! I will test this iteration out today, if goes well this is a big LGTM from my side :-)

@Multiply
Copy link
Contributor

@john-delivuk What is the correct raw path to query for external metrics?

@john-delivuk
Copy link
Contributor Author

kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/[hpaNamespace]/[metricName]?labelSelector=[label]=[targetValue]'

To view available metrics

kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1

@s-urbaniak
Copy link
Contributor

@john-delivuk PR #179 has just been merged. To make it easier for you, I created the following commit which you can cherry-pick: s-urbaniak@bbac2bc

@john-delivuk
Copy link
Contributor Author

Completed @s-urbaniak

@john-delivuk
Copy link
Contributor Author

Is it rebase time?

@s-urbaniak
Copy link
Contributor

@john-delivuk testing now on my cluster, once successful, we're definitely ready to merge 👍

@s-urbaniak
Copy link
Contributor

@john-delivuk awesome work, i just did a sanity test and am getting external metrics 🎉

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/http_requests_per_second"
  },
  "items": [
    {
      "metricName": "",
      "metricLabels": {},
      "timestamp": "2019-04-03T14:01:33Z",
      "value": "100099m"
    }
  ]
}

@s-urbaniak s-urbaniak merged commit 680e404 into kubernetes-sigs:master Apr 3, 2019
@s-urbaniak
Copy link
Contributor

just for completeness, I also verified both custom metrics and resource metrics:

$ kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/example-app-6ff9ff9d-vltfs/http_requests_per_second' | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/example-app-6ff9ff9d-vltfs/http_requests_per_second"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "example-app-6ff9ff9d-vltfs",
        "apiVersion": "/v1"
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2019-04-03T14:17:38Z",
      "value": "100001m"
    }
  ]
}

$ kubectl top pod --all-namespaces 
NAMESPACE     NAME                                   CPU(cores)   MEMORY(bytes)   
default       example-app-6ff9ff9d-vltfs             43m          8Mi             
default       prometheus-adapter-5bc69ff95c-mcm9h    1m           11Mi            
default       vegeta-84fbb5595f-wxrhh                305m         14Mi            
kube-system   coredns-86c58d9df4-shrvd               75m          8Mi             
kube-system   coredns-86c58d9df4-w7bvd               74m          9Mi             
kube-system   etcd-minikube                          24m          48Mi            
kube-system   kube-addon-manager-minikube            0m           4Mi             
kube-system   kube-apiserver-minikube                28m          404Mi           
kube-system   kube-controller-manager-minikube       41m          50Mi            
kube-system   kube-proxy-zp78b                       2m           11Mi            
kube-system   kube-scheduler-minikube                9m           12Mi            
kube-system   storage-provisioner                    0m           13Mi            
monitoring    alertmanager-main-0                    3m           18Mi            
monitoring    alertmanager-main-1                    3m           17Mi            
monitoring    alertmanager-main-2                    3m           17Mi            
monitoring    grafana-5f5677f45d-vgm9m               2m           27Mi            
monitoring    kube-state-metrics-656c58fc5b-ql7dq    1m           31Mi            
monitoring    node-exporter-fwrp7                    3m           19Mi            
monitoring    prometheus-adapter-646885b85b-5hcb5    1m           12Mi            
monitoring    prometheus-k8s-0                       19m          123Mi           
monitoring    prometheus-k8s-1                       20m          129Mi           
monitoring    prometheus-operator-7cb68545c6-g87hv   9m           22Mi            

$ kubectl top node 
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   931m         23%    1506Mi          20%       

@zacharysierakowski
Copy link

@s-urbaniak I am trying to use the prometheus adapter helm charts (https://github.com/helm/charts/tree/master/stable/prometheus-adapter) but I'm assuming there are adjustments that need to be made for v0.5.0 since the current version set in the values.yaml file is v0.4.1. Do you know if anyone is working on updating these for v0.5.0 or if there are particular things I could do manually to make them work?

Sorry if this is the wrong place to ask.

@s-urbaniak
Copy link
Contributor

@zacharysierakowski I am not aware of any work done for helm. All changes in 0.5.0 should be backwards compatible with 0.4.1. What 0.5.0 introduces, is a new configuration section externalRules, so effectively the helm chart should still work.

@zacharysierakowski
Copy link

Thanks for such a quick reply! Are there docs on the externalRules configuration section I can refer to? When I switched 0.4.1 to 0.5.0 and deployed it, everything seemed green and good to go. But when running kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1', I didn't get anything. I'm assuming that it's b/c I missed that new configuration section.

@s-urbaniak
Copy link
Contributor

@zacharysierakowski external metrics (as well as custom metrics) are something you have to configure yourself. If the config is missing, that api won't be enabled. I made a configuration sample here #146 (comment).

As mentioned above, I'll create additional documentation. For the sake of this already long PR, please open a new issue if you have additional questions :-)

@zacharysierakowski
Copy link

@s-urbaniak Okay great. Thank you very much for you help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants