[metricbeat]independent events based on `le` for prometheus histograms #12446

odacremolbap · 2019-06-05T14:41:28Z

Describe the enhancement:

metricbeat prometheus helper is gathering bucket information in a single event, using a structure similar to:

              "my_metric": {
                "sum": 4560.874000000001,
                "count": 1,
                "bucket": {
                  "2000": 0,
                  "512000": 1,
                  "+Inf": 1,
                  "32000": 1,
                  "4000": 0,
                  "64000": 1,
                  "128000": 1,
                  "256000": 1,
                  "1000": 0,
                  "8000": 1,
                  "16000": 1
                }

The values under bucket are hard to work with at visualizations. We mostly rely on count and sum to calculate averages, dismissing all other data.

Describe a specific use case for the enhancement or feature:

Expanding the data above into multiple events, each one containing the le key and value as provided by the prometheus metric, would make it way more flexible visualize, at the cost of storage

              "my_metric": {
                "sum": 4560.874000000001,
                "count": 1
                }
                }

              "my_metric": {
                "le": {
                  "value": "2000",
                  "count": 0,
                }
                }

...

              "my_metric": {
                "le": {
                  "value": "+inf",
                  "count": 1,
                }
                }

The text was updated successfully, but these errors were encountered:

ruflin · 2019-06-06T07:37:50Z

Wouldn't this explode the number of events we have to store? Basically meaning we have 1 event per entry?

What does le stand for?

@odacremolbap Could you share some of the queries you want to run on the data?

exekias · 2019-06-06T09:24:20Z

It would definitely create more events (for histogram metric type only). We did this change in Prometheus collector already, this change would align the helper with that.

le stands for less or equal, and it represents a bucket of the histogram, more info can be found here: https://prometheus.io/docs/concepts/metric_types/#histogram, I think the discussion on naming here can still happen, as the helper hides away prometheus logic under our own format.

This change should allow to perform terms (group by) aggregations on buckets, to get a different line per bucket without previous knowledge of the bucket sizes.

odacremolbap · 2019-06-07T14:34:54Z

getting each bucket expanded increases storage requirements and also CPU/mem/time when processing.

before measuring it for posting here for consideration, I am trying to come up with an alternative, no luck so far:

    "coredns": {
      "stats": {
        "dns": {
          "request": {
            "size": {
              "bytes": [
                {
                  "value": 5559,
                  "le": "2047"
                },
                {
                  "le": "16000",
                  "value": 5559
                },
                {
                  "value": 5559,
                  "le": "200"
                },
                {
                  "le": "400",
                  "value": 5559
                },
...

that structure above is not searchable as is because of how (non nested) arrays work, but I'm wondering if a visualization internally retrieves the doc by timestamp, and then parses the content, in which case a solution would be close at hand just expanding inside one event vs sending one event per expanded value at the histogram/summary

I'm ingesting at elasticsearch using this template:

                "dns" : {
                  "properties" : {
                    "request" : {
                      "properties" : {
                        "duration" : {
                          "properties" : {
                            "ns" : {
                              "properties" : {
                                "count" : {
                                  "type" : "long"
                                },
                                "le" : {
                                  "ignore_above" : 1024,
                                  "type" : "keyword"
                                },
                                "sum" : {
                                  "type" : "long"
                                },
                                "value" : {
                                  "type" : "long"
                                }
                              }
                            }
                          }

At the visualization the terms agregation by le is showing the right data at the legend, but values are using whatever aggregation I choose among all le values.

there are also some glitches with a 0 le value and an empty one that might probably come from the +inf label at prometheus.

I'll try changing types for all le keys and values to array.
I think this solution is a lot nicer than the many events, but not sure how feasible.

odacremolbap · 2019-06-13T07:50:37Z

at the kubernetes apiserver metricset
using the standard layout, the resulting document is 514K
using the expanded layout, the resulting document is 1,6M

@ruflin

exekias · 2019-06-13T08:18:47Z

ey @odacremolbap could you add some more detail? what do you mean by document? I guess this is the index size?

odacremolbap · 2019-06-13T08:27:17Z

that's the size of this file

https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/apiserver/_meta/testdata/docs.plain-expected.json

using both layouts

ruflin · 2019-06-13T13:48:50Z

We should definitively compare the size in Elasticsearch (index size after refresh etc.).

odacremolbap · 2019-06-13T17:29:14Z

if this was merged, is there a way in kibana to manage histograms?

we would need to:

group by le
get the max value per time bucket
add the derivative of value between buckets
(at this point the result is probably a mess, a bunch of stacked lines that don't provide an intuitive meaning)
now the tricky part: we need to get those resulting buckets, and de-group the ´lefields, using that value + the number derivative of thevalue` resulting from above to obtain the percentile.

Is that possible at all?

odacremolbap · 2019-06-17T16:09:53Z

Size wise, these are the results of 5 minutes of monitoring apiserver, 10s freq:

Format	Doc Count	Index Size
Standard	14080	2mb
Expanded	50840	5.1mb

3.6 x number of documents indexed
2.5 x storage size

The metricset contains 2 metrics

a counter, which is left unchanged when using the expanded format
an histogram, which uses 8 bucketed values, and that will be expanded to a number of events at the standard format per unique set of labels. When using the expanded format these events will be sub-expanded into 8 events each (one per bucket).

@ruflin @exekias

exekias · 2019-06-19T07:47:12Z

Thank you for doing the numbers. Something that brought my attention is the number of documents that a single fetch creates, even with the standard layout, it sounds like it creates ~450 docs. I'm wondering what's the cause for this (I think I remember api server is quite verbose, as it provides detailed info per client & path). From the other metricsets you are working on, is this amount of data that common?

odacremolbap · 2019-06-19T07:55:38Z

i think the reason is the number of labels and the cardinality of those labels.
In the case of apiserver_request_duration_seconds_bucket with almost no usage, you get more than 6000 different values at prometheus.

As the usage increases at a production environment, the number of metrics might go up.

exekias · 2019-06-19T07:59:42Z

Yeah, I'm guessing this is not the general case, what about the other histograms you saw?

odacremolbap · 2019-06-19T08:12:14Z

kubeproxy and kubescheduler are kept in the low side
kubecontroller has some 400 lines histograms on no-usage test cluster

exekias · 2019-06-19T10:36:38Z

I understand with lines you mean documents.

Yeah, I can see how apiserver/kubecontroller can become a problem, even wit the standard layout, 2MB per 5 mins sounds like a lot of data, we should decide if it's worth it, if so, probably make them optional (disabled by default?)

odacremolbap · 2019-06-19T12:24:00Z

prometheus metrics lines

for histograms we will generate an event for each 8 lines when all labels are considered keylabels. (I think for the apiserver we where missing some labels into keylabels, that's being fixed at #12610)

odacremolbap · 2019-07-17T06:52:19Z

closing, we are selecting a reduced set of histogram buckets at visualizations as a work around

odacremolbap added discuss Issue needs further discussion. Metricbeat Metricbeat labels Jun 5, 2019

odacremolbap self-assigned this Jun 5, 2019

exekias added the Team:Integrations Label for the Integrations team label Jun 5, 2019

odacremolbap mentioned this issue Jun 10, 2019

add expanded layout to current prometheus helper implementation #12488

Closed

andresrc added the [zube]: In Progress label Jun 17, 2019

odacremolbap closed this as completed Jul 17, 2019

zube bot added [zube]: Done and removed [zube]: In Progress labels Jul 17, 2019

andresrc removed the [zube]: Done label Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metricbeat]independent events based on `le` for prometheus histograms #12446

[metricbeat]independent events based on `le` for prometheus histograms #12446

odacremolbap commented Jun 5, 2019

ruflin commented Jun 6, 2019 •

edited

Loading

exekias commented Jun 6, 2019

odacremolbap commented Jun 7, 2019

odacremolbap commented Jun 13, 2019

exekias commented Jun 13, 2019

odacremolbap commented Jun 13, 2019

ruflin commented Jun 13, 2019

odacremolbap commented Jun 13, 2019

odacremolbap commented Jun 17, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

odacremolbap commented Jul 17, 2019

[metricbeat]independent events based on le for prometheus histograms #12446

[metricbeat]independent events based on le for prometheus histograms #12446

Comments

odacremolbap commented Jun 5, 2019

ruflin commented Jun 6, 2019 • edited Loading

exekias commented Jun 6, 2019

odacremolbap commented Jun 7, 2019

odacremolbap commented Jun 13, 2019

exekias commented Jun 13, 2019

odacremolbap commented Jun 13, 2019

ruflin commented Jun 13, 2019

odacremolbap commented Jun 13, 2019

odacremolbap commented Jun 17, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

exekias commented Jun 19, 2019

odacremolbap commented Jun 19, 2019

odacremolbap commented Jul 17, 2019

[metricbeat]independent events based on `le` for prometheus histograms #12446

[metricbeat]independent events based on `le` for prometheus histograms #12446

ruflin commented Jun 6, 2019 •

edited

Loading