Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kube-state-metrics based metricsets #4253

Merged
merged 5 commits into from
May 16, 2017

Conversation

exekias
Copy link
Contributor

@exekias exekias commented May 8, 2017

Add k8s metrics extracted from kube-state-metrics services. This adds info specially useful about:

  • Status of deployments, pods and replicasets
  • Resource limits, requests and usage by pods
  • Node available resources

I have updated existing dashboard with this kind of info

image

@exekias exekias added in progress Pull request is currently in progress. Metricbeat Metricbeat review v6.0.0-alpha2 labels May 8, 2017
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to have this module in. We should have a conversation about the end result of the data model.

}
}
},
"status": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name implies state and here it is status?

Copy link
Contributor Author

@exekias exekias May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using kube-state-metrics nomenclature here, check https://github.com/elastic/beats/pull/4253/files/0f5f98da5eb5acf11055a8d8cf551f7b4eb1ed0e#diff-c70b508c24452e40e23c98c944027e25R54 to see the metric family names

This is how I would explain it:

State (from kube-state-metrics) means that we get access to the global state of the cluster (think of it as a snapshot in time). While this status metrics are one of the specific metrics of a container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we namespace this under state it probably becomes more obvious (see data structure comments).

"resource": {
"requests": {
"cpu": {
"nanocores": 5000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that will change over time? If we call it state_container all these fields which are not shared should be under one namespace.

func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) {
eventsMap := map[string]common.MapStr{}
for _, family := range families {
//fmt.Println(family)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed.

nanocores = 1000000000
)

func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would kind of hope instead of adding dependency on prometheus here, that this can "depend" on the prometheus module / metricset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial intention, but I have 2 issues with that:

  • Prometheus module mangles original data on it's way, and it's not really aware of the format of these metrics, it does something generic. So I need to go over the resulting MapStr again
  • If we do a change in prometheus module it may break this, having that I'm not using its format

Something I thought about was to move all prometheus code to a common helper (like HTTP), WDYT about that? At the end it looked like overkill but I'm ok with doing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the helper. Can also be done in a follow up PR. But I think it's good to have it around as this will not be the last metricset using prometheus.

}

default:
// Ignore unknown metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like an error or at least worth logging as debug?

"scheduled": "true"
}
},
"state_pod": {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty namespace

@@ -0,0 +1,59 @@
package state_replicaset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we call this replicaset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the same format for all kube-state-metrics metricsets for one reason, the hosts setting is incompatible between kubelet and kube-state-metrics, you have to choose, and it's nice to have that clear from the metricset name, to avoid confusion.

Check https://github.com/exekias/beats/blob/d091ce76089c44e5fd872661d529932aa92633df/metricbeat/module/kubernetes/_meta/config.yml for an example of possible configurations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point. Leads me to think it should be it's own module but we concluded this discussion :-) Don't have a better solution at the moment :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not 100% happy with this either, we can see how it goes and perhaps improve it in the future


return &MetricSet{
BaseMetricSet: base,
counter: 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove counter

"github.com/prometheus/common/expfmt"
)

func GetFamilies(http *helper.HTTP) ([]*dto.MetricFamily, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a prometheus helper under the helper namespace? Or can you use the prometheus.collector code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to the helper, prometheus collector issue explained in a previous comment, but I can update it to share the helper

@@ -0,0 +1,43 @@
{
"comment": "",
"ignore": "test",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add github.com/elastic/beats here as otherwise when you remove all the beats dependencies will be added to when doing an update.

func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) {
eventsMap := map[string]common.MapStr{}
for _, family := range families {
//fmt.Println(family)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed.


logp.Beta("The kubernetes state_container metricset is experimental")

if err := base.Module().UnpackConfig(&config); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be dropped since no config is used.

func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
config := struct{}{}

logp.Beta("The kubernetes state_pod metricset is experimental")
Copy link
Member

@andrewkroh andrewkroh May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logp.Experimental().

- name: cpu.nanocores
type: long
description: >
Container CPU requested nanocores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is a nanocore? Nanoseconds of CPU time per core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the other way around, in this case it's number of requested (reserved) full CPU cores. So 1 Core would mean the container has 1 full core reserved for itself, as long as it's alive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this metric is also present as a usage metric: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/container/_meta/fields.yml#L21

In that case it's the accumulated value over time (I use derivative in graphs to extract the used nancores per bucket)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common scenario is to check requested vs limits vs real usage. Ideally you would put some sane limits based on real usage of a given container

"github.com/prometheus/common/expfmt"
)

func GetFamilies(http *helper.HTTP) ([]*dto.MetricFamily, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[golint] reported by reviewdog 🐶
exported function GetFamilies should have comment or be unexported

return families, nil
}

func GetLabel(m *dto.Metric, label string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[golint] reported by reviewdog 🐶
exported function GetLabel should have comment or be unexported

@exekias exekias force-pushed the kube-state-metrics branch 8 times, most recently from faf700b to 5589ec9 Compare May 10, 2017 16:58
@exekias exekias removed the in progress Pull request is currently in progress. label May 10, 2017
@exekias exekias changed the title [WIP] Add kube-state-metrics based metricsets Add kube-state-metrics based metricsets May 10, 2017
"fmt"

"github.com/elastic/beats/metricbeat/mb"
dto "github.com/prometheus/client_model/go"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a newline should be added.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During this review I mainly focused on the data structure. I really like that now the output documents are organised with a focus on the consumer, means the person that builds dashboards and queries for it. It decouples configuration namespaces from the output which I think is a good thing.

Assuming we can keep the data structure I'm tempted to say in the future we could split it up again in 2 modules both not called kubernetes but both putting their data into the kubernetes namespace which is kind of an empty module that only has fields and docs, no code.

As now fields construct became not quite tricky to follow I think it is critical that we compare our data.json output against the fields.yml file to make sure all fields are documented.


// Prometheus helper retrieves prometheus formatted metrics
type Prometheus struct {
HTTP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, @urso will not like it too much ;-)

- module: kubernetes
enabled: false
metricsets:
- state_node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a comment from @andrewkroh in an other PR I'm hoping we can make this state.node etc. in the future (not in this PR).


func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) {
eventsMap := map[string]common.MapStr{}
for _, family := range families {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed or can it be moved now to the helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty custom for this case, so we need to keep it.

I want to see what future metricsets using the helper have in common, perhaps we can still extract some helpers there

@exekias exekias force-pushed the kube-state-metrics branch 3 times, most recently from 1bde788 to d7efc3a Compare May 11, 2017 17:20
@exekias
Copy link
Contributor Author

exekias commented May 11, 2017

jenkins, retest this please

@andrewkroh
Copy link
Member

^ You're so polite to the robots 🤖 . I guess you are hoping for favorable treatment when they take over.

@exekias
Copy link
Contributor Author

exekias commented May 11, 2017

image

@exekias exekias force-pushed the kube-state-metrics branch 3 times, most recently from 385d5b4 to e7c3361 Compare May 11, 2017 21:10
@exekias exekias force-pushed the kube-state-metrics branch 4 times, most recently from 6c773d8 to fb8f972 Compare May 13, 2017 09:17
@exekias exekias force-pushed the kube-state-metrics branch 2 times, most recently from 8567fa9 to 0ec0b1d Compare May 14, 2017 16:15
@exekias exekias added the in progress Pull request is currently in progress. label May 14, 2017
@exekias exekias force-pushed the kube-state-metrics branch 4 times, most recently from 876e39d to 33463ae Compare May 14, 2017 21:45
@exekias
Copy link
Contributor Author

exekias commented May 15, 2017

jenkins, test it

@exekias exekias force-pushed the kube-state-metrics branch 3 times, most recently from c80556f to 0548bb4 Compare May 15, 2017 13:00
@exekias exekias force-pushed the kube-state-metrics branch 3 times, most recently from 4319e79 to 1519037 Compare May 15, 2017 14:25
@exekias exekias removed the in progress Pull request is currently in progress. label May 15, 2017
@@ -426,7 +426,7 @@ def extract_fields(doc_list, name):

# TODO: Make fields_doc path more generic to work with beat-generator
with open(fields_doc, "r") as f:
path = os.path.abspath(os.path.dirname(__file__) + "../../../../_meta/fields.common.yml")
path = os.path.abspath(os.path.dirname(__file__) + "../../../../_meta/fields.generated.yml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the generated exist in all beats?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be the reason that the appveyor builds fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a fallback to common.yml, let's see if appveyor likes that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it worked, thanks!

@@ -51,6 +52,8 @@ services:
jolokia: { condition: service_healthy }
kafka: { condition: service_healthy }
kibana: { condition: service_healthy }
kubernetes: { condition: service_healthy }
kube-state: { condition: service_started }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this one is set to started and not healthy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-state-metrics doesn't provide health service, but as soon as it starts it should work so we can live with this it seems

- state_pod
- state_container
period: 10s
hosts: ["kube-state-metrics:8080"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have a "common" ip or dns name in a kubernetes environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


self.assertItemsEqual(self.de_dot(KUBERNETES_FIELDS), evt.keys(), evt)

self.assert_fields_are_documented(evt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

This file includes fields from processors, as they are used by tests

Fallbacks to `fields.common.yml` when generated is not present
@ruflin ruflin merged commit 5cbd804 into elastic:master May 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants