External Metrics contd #146

john-delivuk · 2018-12-09T21:46:16Z

Attempted to isolate and prune some duplicated logic.
Cleaned up comments from PR INCOMPLETE: External Metrics #79

I was hesitant to start extracting some of the duplicated components. Let me know if you would like this refined further or we can continue to clean up in future work.

Thanks to @dangoodman5425 and @antoinne85 for everything they've contributed.

pvdvreede · 2018-12-11T01:20:52Z

Just noticed the binary was committed: https://github.com/DirectXMan12/k8s-prometheus-adapter/pull/146/files#diff-2fad0bac2a901c821b32c417b2ee77b8 was this by accident?

DirectXMan12 · 2018-12-21T00:34:51Z

cc @brancz

brancz · 2019-01-07T12:33:41Z

Generally I think having full external metrics support would be nice, but I'd like us to first tackle the existing known scalability issues, that from our recent discussions would significantly change how the adapter works, including this functionality. What do you think @DirectXMan12 ?

antoinne85 · 2019-01-07T18:40:27Z

@brancz Thanks for your feedback!

I've been away from this PR for a while now and, admittedly, have no idea which scalability issues you're referring to (I'm not currently following any of the topics in Slack or other areas very closely). So please take the following questions with that in mind.

Are you referring to scalability issues within the boilerplate or with the way external metrics are implemented in K8S?
If you're referring to issues within K8S, specifically, are there any rough estimates floating around about when they might get addressed?

I'm not attempting to apply any pressure at all, just curious about approximate timelines. My general rule-of-thumb on these kinds of things is to not "let perfect be the enemy of good." For example, if the scalability fixes are 2 months away, the time-to-external-metrics-adapter-rework is 2 months plus some additional lead time. If it's six months away, you get the picture.

In the meantime, it seems like there's this external metrics API out there that not many people can actually use because they'd be required to roll their own adapter to Prometheus (or whatever metrics backend they're using).

Now, I've not been involved with the K8S community enough to warrant my having an opinion on the matter, but it seems like (if the code is up-to-quality) it might be worthwhile to go ahead and enable external metrics scenarios. (If for no other reason than the code has already been written, and there's work to be done after the scalability issues are fixed anyway to bring this up-to-speed.)

Again, no negative tone intended. Just trying to get an understanding of things so I can make the proper moves on our end.

john-delivuk · 2019-01-09T06:57:57Z

Also preventing an unessecary fork, as well as long as configuration doesn't change, or change much. We can perform these changes in the background.

brancz · 2019-01-15T14:15:04Z

Working on all of this is next on our list. The biggest problem I would have with merging this is that the configuration aspect is likely to change dramatically with the planned changes. If if everyone is ok with that, and we list it in the release notes, I'm ok with moving forward with this. What do you think @DirectXMan12 @s-urbaniak @metalmatze ?

s-urbaniak · 2019-01-15T15:17:19Z

First, thanks a lot @john-delivuk for reviving these PRs!

I am 👍 on the comments from @brancz. Having said that we could make it clear in the readme that not only big (probably backwards incompatible?) configuration changes are to be expected not only for external (this contribution) but also for the existing custom metrics support.

Regarding the further refactoring we'd have to take external metrics into account anyways. Not having existing code would tempt to "we'll do it later". That way we have some natural pressure to implement external metrics for the future factorings as well.

I'd suggest to do another review sweep throughout the code before merging but it seems to be a good and long awaited addition.

Obviously I am leaving the final word to @DirectXMan12 :-)

DirectXMan12 · 2019-01-29T21:34:42Z

I'm fine with the approach of "merge now, disclaimer that it will change later" (don't forget to follow semver ;-) )

DirectXMan12 · 2019-01-29T21:40:04Z

Code-changes-wise, I believe I've reviewed a previous version of this, and I trust the other reviewers here (@brancz @s-urbaniak @metalmatze), so I'm fine moving forward with this (don't forget to squash commits into logical chunks, though).

brancz · 2019-01-30T21:05:50Z

Sounds and looks good from my side. In that case @john-delivuk could you please do as @DirectXMan12 said and squash these changes into logical chunks, then this is good to go for me 🙂 .

john-delivuk · 2019-01-30T21:31:45Z

F*** me. I just rushed the rebase, I'll clean this up tonight when I get home and have time.

john-delivuk · 2019-02-06T15:47:53Z

@brancz Let me know if you need anything else!

s-urbaniak · 2019-02-06T15:54:12Z

pkg/external-provider/errors.go

+// NewOperatorNotSupportedByPrometheusError creates an error that represents the fact that we were requested to service a query that
+// Prometheus would be unable to support.
+func NewOperatorNotSupportedByPrometheusError() error {
+	return errors.New("operator not supported by prometheus")


nit: these should be vars in the style of:

var ( ErrOperatorNotSupportedByPrometheus = errors.New("operator not supported by prometheus") ... )

s-urbaniak · 2019-02-06T15:54:52Z

pkg/external-provider/external_series_registry.go

+
+// overridableSeriesRegistry is a basic SeriesRegistry
+type externalSeriesRegistry struct {
+	mu sync.RWMutex


could you document what fields are protected by this?

Preferred way to document this? Inline comment ?

yes, a small comment is totall sufficient 👍

@antoinne85 or @DirectXMan12
From what I can tell we lock the metrics series when we read to prevent read / write issues. Can you just confirm so I don't mislead :)

pkg/external-provider/external_series_registry.go

s-urbaniak · 2019-02-06T15:57:07Z

pkg/external-provider/external_series_registry.go

+	converters := result.converters
+
+	if len(newSeriesSlices) != len(converters) {
+		glog.Errorf("need one set of series per converter")


This will panic a few lines below if len(newSeriesSlices) > len(converters). We should return with an error from this function instead.

pkg/external-provider/metric_converter.go

s-urbaniak · 2019-02-06T16:09:47Z

pkg/external-provider/metric_namer.go

+// concurrently.  Returned group-resources are "normalized" as per the
+// MetricInfo#Normalized method.  Group-resources passed as arguments must
+// themselves be normalized.
+type MetricNamer interface {


This duplicates most of the code from custom-provider/metric_namer.go. Is it possible to merge those two implementations? I diff'ed locally and there seems to be some differences, albeit not many.

So logically I feel like the metric namer should move to naming. Opinions ?

naming sounds like a good choice 👍

john-delivuk · 2019-02-06T16:13:45Z

Thanks for the quick review. Will get to this today I hope. This PR has been open so long it's fallen behind. x.x

s-urbaniak · 2019-02-06T16:18:19Z

@john-delivuk thank you so much for the effort! 👍

s-urbaniak · 2019-02-06T16:20:38Z

@john-delivuk another thing, it seems you are vendoring testify, as Travis fails with:


# github.com/directxman12/k8s-prometheus-adapter/pkg/external-provider
pkg/external-provider/periodic_metric_lister_test.go:8:2: cannot find package "github.com/stretchr/testify/require" in any of:
...

Do you mind to vendor it in a separate commit?

Edited Makefile to add cross build support for s390x. Adding External Metrics Provider

Multiply · 2019-03-19T09:00:03Z

We're very interested in seeing this merged and released, so let me know if we can do anything to help. (Like testing)

john-delivuk · 2019-03-19T21:59:18Z

If anyone is interested in playing with this, I've been pushing builds to my personal dockerhub.

https://hub.docker.com/r/jdelivuk/k8s-prometheus-adapter-amd64

Use at your own risk

Multiply · 2019-03-20T06:38:31Z

I actually have it working. I have yet to create autoscaling to use the rules. Although, I am not using the grouping functionality mentioned above.

Multiply · 2019-03-20T09:55:42Z

The only thing I am having a hard time to get working is having external metrics for a specific namespace. My data does not include kubernetes-related information, so I need to add the namespace in, somehow.
I saw #167, which seems to request such a feature.

Is there something the code in this PR can do to help in this regard, or will it only be available in the default namespace?

Multiply · 2019-03-20T14:53:53Z

I ended up using metric_relabel_configs to add the correct namespace to prometheus for now.

I can now get metrics out using the raw API like this:

$ kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/production/queue_messages?labelSelector=queue=default,vhost=root'
{"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/external.metrics.k8s.io/v1beta1/namespaces/production/queue_messages"},"items":[{"metricName":"","metricLabels":{},"timestamp":"2019-03-20T14:51:25Z","value":"800"}]}

But it doesn't seem like HorizontalPodAutoscaler can access it:

E0320 14:49:08.536433       1 horizontal.go:189] failed to compute desired number of replicas based on listed metrics for Deployment/production/queue-default: failed to get queue_messages external metric: unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"
I0320 14:49:08.536486       1 event.go:218] Event(v1.ObjectReference{Kind:"HorizontalPodAutoscaler", Namespace:"production", Name:"queue-default", UID:"081f4772-4b1a-11e9-bd8f-02a321a654d2", APIVersion:"autoscaling/v2beta1", ResourceVersion:"32927459", FieldPath:""}): type: 'Warning' reason: 'FailedGetExternalMetric' unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"
I0320 14:49:08.536512       1 event.go:218] Event(v1.ObjectReference{Kind:"HorizontalPodAutoscaler", Namespace:"production", Name:"queue-default", UID:"081f4772-4b1a-11e9-bd8f-02a321a654d2", APIVersion:"autoscaling/v2beta1", ResourceVersion:"32927459", FieldPath:""}): type: 'Warning' reason: 'FailedComputeMetricsReplicas' failed to get queue_messages external metric: unable to get external metric production/queue_messages/&LabelSelector{MatchLabels:map[string]string{queue: default,vhost: root,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: queue_messages.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list queue_messages.external.metrics.k8s.io in the namespace "production"

Am I missing something obvious?

john-delivuk · 2019-03-25T04:01:07Z

Just an update. I found the issue and patched it.... while along the way hating how half of naming was being handled by pkg/naming, and the other half was in pkg/external-metrics so pretty decent rewrite coming up. I do want to cycle through some tests tomorrow as I think there are a few edge cases I don't think are being handled correctly

john-delivuk · 2019-03-25T16:22:54Z

Quick design question. For .DiscoveryRule.resource.overrides this is not really needed for external metrics. I see a few options,

Split the DiscoveryRule into 2 different types
Document that iit's unused, have pkg/naming just ignore that if specified.
Have it follow suit.

i'm preferential to 2. Let me know if I'm also missing something, or you would like a different approach.

john-delivuk · 2019-03-25T16:26:31Z

@Multiply I'll look into your mentioned issue after these pending changes. I've refactored a lot of this, and will test soon to see if this is still an issue.

john-delivuk · 2019-03-26T23:09:17Z

Ok so this change looks big and scary but here is the gist.

Remove the functionality duplicated by pkg/naming
I moved in the query builder to the appropriate sounding place for external queries
Removed the namespace section of the query builder. After looking at the provider.ExternalMetricInfo it is currently ignorant to the namespace, so I feel like our queries should be as well. If someone would like to specify a namespace they should use a label selector which would translate. As a result, .resources.overrides are ignored for any external metric.

I threw up an image on my docker hub mentioned, feel free to try it out. This build feels like it could be the one :)

s-urbaniak · 2019-03-27T07:58:56Z

this is looking great @john-delivuk 👍 thanks so much for putting the effort into this! I will test this iteration out today, if goes well this is a big LGTM from my side :-)

Multiply · 2019-03-27T08:19:54Z

@john-delivuk What is the correct raw path to query for external metrics?

john-delivuk · 2019-03-27T11:22:52Z

kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/[hpaNamespace]/[metricName]?labelSelector=[label]=[targetValue]'

To view available metrics

kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1

s-urbaniak · 2019-03-28T14:00:02Z

@john-delivuk PR #179 has just been merged. To make it easier for you, I created the following commit which you can cherry-pick: s-urbaniak@bbac2bc

This is a merge commit back-porting PR kubernetes-sigs#179

…er into em-isolation

john-delivuk · 2019-03-29T01:11:35Z

Completed @s-urbaniak

john-delivuk · 2019-04-01T16:59:54Z

Is it rebase time?

s-urbaniak · 2019-04-03T13:31:37Z

@john-delivuk testing now on my cluster, once successful, we're definitely ready to merge 👍

s-urbaniak · 2019-04-03T14:02:19Z

@john-delivuk awesome work, i just did a sanity test and am getting external metrics 🎉

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/http_requests_per_second"
  },
  "items": [
    {
      "metricName": "",
      "metricLabels": {},
      "timestamp": "2019-04-03T14:01:33Z",
      "value": "100099m"
    }
  ]
}

s-urbaniak · 2019-04-03T14:18:17Z

just for completeness, I also verified both custom metrics and resource metrics:

$ kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/example-app-6ff9ff9d-vltfs/http_requests_per_second' | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/example-app-6ff9ff9d-vltfs/http_requests_per_second"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "example-app-6ff9ff9d-vltfs",
        "apiVersion": "/v1"
      },
      "metricName": "http_requests_per_second",
      "timestamp": "2019-04-03T14:17:38Z",
      "value": "100001m"
    }
  ]
}

$ kubectl top pod --all-namespaces 
NAMESPACE     NAME                                   CPU(cores)   MEMORY(bytes)   
default       example-app-6ff9ff9d-vltfs             43m          8Mi             
default       prometheus-adapter-5bc69ff95c-mcm9h    1m           11Mi            
default       vegeta-84fbb5595f-wxrhh                305m         14Mi            
kube-system   coredns-86c58d9df4-shrvd               75m          8Mi             
kube-system   coredns-86c58d9df4-w7bvd               74m          9Mi             
kube-system   etcd-minikube                          24m          48Mi            
kube-system   kube-addon-manager-minikube            0m           4Mi             
kube-system   kube-apiserver-minikube                28m          404Mi           
kube-system   kube-controller-manager-minikube       41m          50Mi            
kube-system   kube-proxy-zp78b                       2m           11Mi            
kube-system   kube-scheduler-minikube                9m           12Mi            
kube-system   storage-provisioner                    0m           13Mi            
monitoring    alertmanager-main-0                    3m           18Mi            
monitoring    alertmanager-main-1                    3m           17Mi            
monitoring    alertmanager-main-2                    3m           17Mi            
monitoring    grafana-5f5677f45d-vgm9m               2m           27Mi            
monitoring    kube-state-metrics-656c58fc5b-ql7dq    1m           31Mi            
monitoring    node-exporter-fwrp7                    3m           19Mi            
monitoring    prometheus-adapter-646885b85b-5hcb5    1m           12Mi            
monitoring    prometheus-k8s-0                       19m          123Mi           
monitoring    prometheus-k8s-1                       20m          129Mi           
monitoring    prometheus-operator-7cb68545c6-g87hv   9m           22Mi            

$ kubectl top node 
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   931m         23%    1506Mi          20%

zacharysierakowski · 2019-04-10T12:06:15Z

@s-urbaniak I am trying to use the prometheus adapter helm charts (https://github.com/helm/charts/tree/master/stable/prometheus-adapter) but I'm assuming there are adjustments that need to be made for v0.5.0 since the current version set in the values.yaml file is v0.4.1. Do you know if anyone is working on updating these for v0.5.0 or if there are particular things I could do manually to make them work?

Sorry if this is the wrong place to ask.

s-urbaniak · 2019-04-10T13:58:09Z

@zacharysierakowski I am not aware of any work done for helm. All changes in 0.5.0 should be backwards compatible with 0.4.1. What 0.5.0 introduces, is a new configuration section externalRules, so effectively the helm chart should still work.

zacharysierakowski · 2019-04-10T14:12:57Z

Thanks for such a quick reply! Are there docs on the externalRules configuration section I can refer to? When I switched 0.4.1 to 0.5.0 and deployed it, everything seemed green and good to go. But when running kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1', I didn't get anything. I'm assuming that it's b/c I missed that new configuration section.

s-urbaniak · 2019-04-10T14:30:05Z

@zacharysierakowski external metrics (as well as custom metrics) are something you have to configure yourself. If the config is missing, that api won't be enabled. I made a configuration sample here #146 (comment).

As mentioned above, I'll create additional documentation. For the sake of this already long PR, please open a new issue if you have additional questions :-)

zacharysierakowski · 2019-04-10T14:31:21Z

@s-urbaniak Okay great. Thank you very much for you help!

john-delivuk force-pushed the em-isolation branch 4 times, most recently from eb8963b to 0d6ef39 Compare December 10, 2018 04:06

antoinne85 mentioned this pull request Dec 29, 2018

INCOMPLETE: External Metrics #79

Closed

john-delivuk force-pushed the em-isolation branch from a4140c2 to 2bf8304 Compare January 30, 2019 21:30

john-delivuk force-pushed the em-isolation branch from 2bf8304 to 0c50949 Compare February 6, 2019 15:47

s-urbaniak reviewed Feb 6, 2019

View reviewed changes

pkg/external-provider/external_series_registry.go Outdated Show resolved Hide resolved

s-urbaniak reviewed Feb 6, 2019

View reviewed changes

pkg/external-provider/metric_converter.go Outdated Show resolved Hide resolved

s-urbaniak reviewed Feb 6, 2019

View reviewed changes

DirectXMan12 and others added 2 commits February 10, 2019 15:59

Merge pull request kubernetes-sigs#158 from linux-on-ibm-z/cross-compile

c2e176b

Edited Makefile to add cross build support for s390x. Adding External Metrics Provider

Moving metric naming to it's own package

6030912

john-delivuk added 2 commits March 26, 2019 18:57

Refactoring external to leverage naming

ff409a0

Adding updated test

e103a8e

s-urbaniak and others added 2 commits March 28, 2019 21:04

pkg/naming: add non-namespace resources

d447eb1

This is a merge commit back-porting PR kubernetes-sigs#179

Merge branch 'master' of github.com:DirectXMan12/k8s-prometheus-adapt…

fa27078

…er into em-isolation

s-urbaniak merged commit 680e404 into kubernetes-sigs:master Apr 3, 2019

s-urbaniak mentioned this pull request Apr 23, 2019

Add support for External metrics #65

Closed

qw1mb0 mentioned this pull request May 28, 2019

add external metrics info about prometheus adapter kubernetes/kubernetes#78415

Closed

External Metrics contd #146

External Metrics contd #146

Conversation

john-delivuk commented Dec 9, 2018

pvdvreede commented Dec 11, 2018

DirectXMan12 commented Dec 21, 2018

brancz commented Jan 7, 2019

antoinne85 commented Jan 7, 2019

john-delivuk commented Jan 9, 2019

brancz commented Jan 15, 2019 • edited Loading

s-urbaniak commented Jan 15, 2019

DirectXMan12 commented Jan 29, 2019

DirectXMan12 commented Jan 29, 2019

brancz commented Jan 30, 2019

john-delivuk commented Jan 30, 2019

john-delivuk commented Feb 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-urbaniak Feb 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-delivuk commented Feb 6, 2019

s-urbaniak commented Feb 6, 2019

s-urbaniak commented Feb 6, 2019

Multiply commented Mar 19, 2019

john-delivuk commented Mar 19, 2019 • edited Loading

Multiply commented Mar 20, 2019

Multiply commented Mar 20, 2019

Multiply commented Mar 20, 2019

john-delivuk commented Mar 25, 2019

john-delivuk commented Mar 25, 2019 • edited Loading

john-delivuk commented Mar 25, 2019

john-delivuk commented Mar 26, 2019 • edited Loading

s-urbaniak commented Mar 27, 2019 • edited Loading

Multiply commented Mar 27, 2019

john-delivuk commented Mar 27, 2019

s-urbaniak commented Mar 28, 2019

john-delivuk commented Mar 29, 2019

john-delivuk commented Apr 1, 2019

s-urbaniak commented Apr 3, 2019

s-urbaniak commented Apr 3, 2019

s-urbaniak commented Apr 3, 2019

zacharysierakowski commented Apr 10, 2019

s-urbaniak commented Apr 10, 2019

zacharysierakowski commented Apr 10, 2019

s-urbaniak commented Apr 10, 2019

zacharysierakowski commented Apr 10, 2019

brancz commented Jan 15, 2019 •

edited

Loading

s-urbaniak Feb 6, 2019 •

edited

Loading

john-delivuk commented Mar 19, 2019 •

edited

Loading

john-delivuk commented Mar 25, 2019 •

edited

Loading

john-delivuk commented Mar 26, 2019 •

edited

Loading

s-urbaniak commented Mar 27, 2019 •

edited

Loading