Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog] Histograms metrics are showing wrong values in Datadog #7065

Closed
ransoor2 opened this issue Jan 6, 2022 · 4 comments
Closed
Assignees
Labels
bug Something isn't working exporter/datadog Datadog components

Comments

@ransoor2
Copy link

ransoor2 commented Jan 6, 2022

Describe the bug
When reporting histograms, we see wrong values as Datadog Distributions.
In this example, we created a histogram with boundaries of {1000, 10000, 100000} and reported multiple times the values {500,5000,50000}

What did you expect to see?
I expected to see the same bucket values in Datadog as I see in the collector's logging exporter:
Three series of points, with values of 1000, 10000, 100000

What did you see instead?
Two series with a value of 1000, one series with a value of 10000

Result of the query avg:test.histogram{*} by {test} :

Result
  {
"status": "ok",
"resp_version": 1,
"series": [
  {
    "end": 1641483599000,
    "attributes": {},
    "metric": "test.histogram",
    "interval": 600,
    "tag_set": [
      "test:3"
    ],
    "start": 1641483000000,
    "length": 1,
    "query_index": 0,
    "aggr": "avg",
    "scope": "test:3",
    "pointlist": [
      [
        1641483000000.0,
        9991.4677734375
      ]
    ],
    "expression": "avg:test.histogram{test:3}",
    "unit": null,
    "display_name": "test.histogram"
  },
  {
    "end": 1641483599000,
    "attributes": {},
    "metric": "test.histogram",
    "interval": 600,
    "tag_set": [
      "test:1"
    ],
    "start": 1641483000000,
    "length": 1,
    "query_index": 0,
    "aggr": "avg",
    "scope": "test:1",
    "pointlist": [
      [
        1641483000000.0,
        1007.1372680664062
      ]
    ],
    "expression": "avg:test.histogram{test:1}",
    "unit": null,
    "display_name": "test.histogram"
  },
  {
    "end": 1641483599000,
    "attributes": {},
    "metric": "test.histogram",
    "interval": 600,
    "tag_set": [
      "test:2"
    ],
    "start": 1641483000000,
    "length": 1,
    "query_index": 0,
    "aggr": "avg",
    "scope": "test:2",
    "pointlist": [
      [
        1641483000000.0,
        1007.1372680664062
      ]
    ],
    "expression": "avg:test.histogram{test:2}",
    "unit": null,
    "display_name": "test.histogram"
  }
],
"to_date": 1641483179000,
"query": "avg:test.histogram{*} by {test}",
"message": "",
"res_type": "time_series",
"times": [],
"from_date": 1641382889000,
"group_by": [
  "test"
],
"values": []
}  

What version did you use?
Version: v0.41.0

What config did you use?

Config
receivers:
otlp:
  protocols:
    http:
      endpoint: "0.0.0.0:55681"

exporters:
datadog:
  env: "prod"
  use_resource_metadata: false
  send_metadata: false
  api:
    key: 
    site: datadoghq.eu
  metrics:
    histograms:
      send_count_sum_metrics: false
logging:
  loglevel: debug

service:
telemetry:
  logs:
    level: info

pipelines:
  metrics:
    receivers: [otlp]
    exporters: [datadog, logging]    

Environment
Ubuntu 14

Additional context
Full example:

main.go
package main

import (
  "context"
  "go.opentelemetry.io/otel/attribute"
  "go.opentelemetry.io/otel/sdk/metric/aggregator/histogram"

  "log"
  "time"

  "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
  "go.opentelemetry.io/otel/metric"
  "go.opentelemetry.io/otel/metric/global"
  controller "go.opentelemetry.io/otel/sdk/metric/controller/basic"
  processor "go.opentelemetry.io/otel/sdk/metric/processor/basic"
  "go.opentelemetry.io/otel/sdk/metric/selector/simple"
)

// Initializes an OTLP exporter, and configures the corresponding trace and
// metric providers.
func initProvider() func() {
  ctx := context.Background()

  exp, err := otlpmetrichttp.New(
  	ctx,
  	otlpmetrichttp.WithInsecure(),
  	otlpmetrichttp.WithEndpoint("0.0.0.0:55681"),
  )
  handleErr(err, "failed to create exporter")

  cont := controller.New(
  	processor.NewFactory(
  		simple.NewWithHistogramDistribution(histogram.WithExplicitBoundaries([]float64{1000, 10000, 100000})),
  		exp,
  	),
  	controller.WithExporter(exp),
  	controller.WithCollectPeriod(100*time.Millisecond),
  )

  global.SetMeterProvider(cont)
  handleErr(cont.Start(context.Background()), "failed to start controller")

  return func() {
  	// Push any last metric events to the exporter.
  	handleErr(cont.Stop(context.Background()), "failed to stop controller")
  	exp.Shutdown(ctx)
  }
}

func main() {
  log.Printf("Waiting for connection...")
  shutdown := initProvider()
  defer shutdown()

  meter := global.Meter("test-meter")
  histogram := metric.Must(meter).NewInt64Histogram("test.histogram")
  ctx := context.Background()
  start := time.Now()
  for i := 1; i < 5; i++ {
  	time.Sleep(time.Second * 1)
  	histogram.Record(ctx,500, attribute.KeyValue{Key: "test", Value: attribute.StringValue("1")})
  	time.Sleep(time.Second)
  	histogram.Record(ctx,5000,attribute.KeyValue{Key: "test", Value: attribute.StringValue("2")})
  	time.Sleep(time.Second)
  	histogram.Record(ctx,50000,attribute.KeyValue{Key: "test", Value: attribute.StringValue("3")})
  }
  log.Printf("%v",time.Since(start))

  log.Printf("Done!")
}

func handleErr(err error, message string) {
  if err != nil {
  	log.Fatalf("%s: %v", message, err)
  }
}
go.mod
module test

go 1.16

require (
  github.com/davecgh/go-spew v1.1.1 // indirect
  go.opentelemetry.io/otel v1.3.0
  go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v0.26.0
  go.opentelemetry.io/otel/metric v0.26.0
  go.opentelemetry.io/otel/sdk v1.3.0
  go.opentelemetry.io/otel/sdk/metric v0.26.0
  golang.org/x/net v0.0.0-20210614182718-04defd469f4e // indirect
  golang.org/x/sys v0.0.0-20210616094352-59db8d763f22 // indirect
  google.golang.org/genproto v0.0.0-20210617175327-b9e0b3197ced // indirect
  gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
)
@ransoor2 ransoor2 added the bug Something isn't working label Jan 6, 2022
@bogdandrutu
Copy link
Member

@mx-psi can you look into this?

@mx-psi mx-psi added the exporter/datadog Datadog components label Jan 10, 2022
@mx-psi
Copy link
Member

mx-psi commented Jan 12, 2022

Hi @ransoor2, thanks for the code to reproduce and for your patience while I was out.

The default mode for histograms, distributions, compresses the data in a way that may cause some deviation from the true value sent. This is specially visible in small-scale examples with very few datapoints. If you have a more realistic example where the values don't make sense, please post it here.

It's on our roadmap to

  1. use the exact value for the avg aggregation (which would give the exact values here) and
  2. give support for the ExponentialHistogram OTLP datatype (OpenTelemetry language libraries still generally don't support generating this type, but once they do this will allow us to give error guarantees on all aggregations).
    I have note down internally that you would benefit from (1).

Until we do (1), if, for a concrete metric, you are having trouble using the distributions method, you can switch to the counters method to get the amount of points per histogram bucket, and/or the send_count_sum_metrics option to send count and sum metrics, from which you can retrieve the exact average by dividing one by the other.

@ransoor2
Copy link
Author

Thank you for your response Pablo
We hope this behavior will be more stable in the future.
We're still having hard time understanding the OTLP Histograms --> Datadog Distributions
I am closing this issue

Thanks,
Ran

@mx-psi
Copy link
Member

mx-psi commented Feb 11, 2022

After #7830 is merged, Datadog distributions produced from histograms will have the exact value for sum, count and average.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/datadog Datadog components
Projects
None yet
Development

No branches or pull requests

3 participants