Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP exporter not exporting ValueRecorder & ValueObserver to Prometheus backend #875

Closed
zeyadkhaled opened this issue Jun 26, 2020 · 21 comments
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics bug Something isn't working

Comments

@zeyadkhaled
Copy link

Describe the bug
I am using an OTLP exporter and metrics pusher with otel-collector with a Prometheus exporter and Logging exporter.
In the Logging exporter, all collected metrics are showing but in Prometheus backend, only ValueCounter instrument values are showing.

I am using the latest otel-collector-contrib image and running it alongside a demo service, some DBs, latest Prometheus image using docker.

What config did you use?

  • otel-collector-config.yaml
receivers:
  otlp:
    endpoint: 0.0.0.0:55678

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: versionsvc

  logging:
    loglevel: debug
    
  stackdriver:
    project: digital-waters-276111
    metric_prefix: versionsvc
    number_of_workers: 3
    skip_create_metric_descriptor: true

processors:
  batch:
  queued_retry:

extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging,stackdriver]
      processors: [batch, queued_retry]
    metrics:
      receivers: [otlp]
      exporters: [logging,prometheus]
  • creation of the exporter and pusher:

func initProviders() (*otlp.Exporter, *push.Controller) {
	collectorAddr, ok := os.LookupEnv("OTEL_RECIEVER_ENDPOINT")
	if !ok {
		collectorAddr = otlp.DefaultCollectorHost + ":" + string(otlp.DefaultCollectorHost)
	}
	exporter, err := otlp.NewExporter(otlp.WithAddress(collectorAddr),
		otlp.WithInsecure(),
		otlp.WithGRPCDialOption(grpc.WithBlock()))

	if err != nil {
		log.Fatal(err)
	}

	tp, err := sdktrace.NewProvider(
		sdktrace.WithConfig(sdktrace.Config{DefaultSampler: sdktrace.AlwaysSample()}),
		sdktrace.WithSyncer(exporter))
	if err != nil {
		log.Fatal(err)
	}

	global.SetTraceProvider(tp)

	pusher := push.New(
		simple.NewWithExactDistribution(),
		exporter,
		push.WithStateful(true),
		push.WithPeriod(2*time.Second),
	)

	global.SetMeterProvider(pusher.Provider())
	pusher.Start()
	return exporter, pusher
}
  • docker-compose.yaml
version: "3.1"
services:

  redis:
    image: redis:4
    ports:
      - "6379:6379"
    entrypoint: 
      "redis-server"

  db:
    image: postgres:11
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: roottoor
      POSTGRES_DB: backend

  open-telemetry-demo:
    build: ../.
    environment:
      - GO111MODULE=on
      - OTEL_RECIEVER_ENDPOINT=otel-collector:55678
    depends_on:
      - otel-collector
      - db 
      - redis   
    ports: 
      - "8088:8088"

  otel-collector:
    image: ${OTELCOL_IMG}
    command: ["--config=/etc/otel-collector-config.yaml", "${OTELCOL_ARGS}"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "1888:1888"   
      - "8888:8888"   
      - "8889:8889"   
      - "13133:13133" 
      - "55678:55678"       
      - "55680:55679"
  
  prometheus:
    container_name: prometheus
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yaml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
  • prometheus.yaml
scrape_configs:
    - job_name: 'otel-collector'
      scrape_interval: 1s
      static_configs:
        - targets: ['otel-collector:8889']

Environment
OS: MacOS Catalina
Compiler(if manually compiled): go 1.14

Additional context

  • The code I wrote using OTEL format to export metrics works because I added a Prometheus receiver in the collector and a Prometheus exporter without changing any of the counter and value recorders code and it worked fine. So the issue is most likely associated with OTEL-exporter not able to export some Metric values that prometheus backend can identify.

  • I have this code with a demo service in the repo:

https://github.com/zeyadkhaled/OpenTelemetry-Go-Project-with-Collector-and-OTLP-Exporter

@jmacd
Copy link
Contributor

jmacd commented Jun 29, 2020

I believe this is a known issue addressed by open-telemetry/oteps#118. The ValueRecorder and ValueObserver events are becoming Summary values in Prometheus, and we will change this default behavior to Gauge. Can you look for a _sum metric w/ the same name prefix?

@jmacd jmacd added the area:metrics Part of OpenTelemetry Metrics label Jun 29, 2020
@zeyadkhaled
Copy link
Author

I just updated to the newest version of OTEL v0.7.0. Now in the logging exporter the captured metrics are shown correctly:

InstrumentationLibrary service.meter 
otel-collector_1       | Metric #0
otel-collector_1       | Descriptor:
otel-collector_1       |      -> Name: api.hit.count
otel-collector_1       |      -> Description: 
otel-collector_1       |      -> Unit: 
otel-collector_1       |      -> Type: INT64
otel-collector_1       | Metric #1
otel-collector_1       | Descriptor:
otel-collector_1       |      -> Name: bytes.recieved
otel-collector_1       |      -> Description: 
otel-collector_1       |      -> Unit: 
otel-collector_1       |      -> Type: SUMMARY
otel-collector_1       | Metric #2
otel-collector_1       | Descriptor:
otel-collector_1       |      -> Name: errors.counter
otel-collector_1       |      -> Description: 
otel-collector_1       |      -> Unit: 
otel-collector_1       |      -> Type: INT64
otel-collector_1       | Metric #3
otel-collector_1       | Descriptor:
otel-collector_1       |      -> Name: process.duration
otel-collector_1       |      -> Description: 
otel-collector_1       |      -> Unit: 
otel-collector_1       |      -> Type: SUMMARY

However, still in prometheus I only get to see the counter instruments output:

# HELP versionsvc_api_hit_count 
# TYPE versionsvc_api_hit_count gauge
versionsvc_api_hit_count{endpoint="api.endpoint:/v2/version:GET"} 1
# HELP versionsvc_errors_counter 
# TYPE versionsvc_errors_counter gauge
versionsvc_errors_counter{service_method="Get"} 1

@zeyadkhaled
Copy link
Author

Also 2 issues appeared after updating to 0.7.0

  • Suddenly the service that initialized the exporters starts outputting:
open-telemetry-demo_1  | 2020/06/30 10:39:04 exporter disconnected
open-telemetry-demo_1  | 2020/06/30 10:39:06 exporter disconnected
open-telemetry-demo_1  | 2020/06/30 10:39:08 exporter disconnected
open-telemetry-demo_1  | 2020/06/30 10:39:10 exporter disconnected
open-telemetry-demo_1  | 2020/06/30 10:39:12 exporter disconnected
  • I think this is related with the first issue. For counter values, only the latest value is captured in Prometheus and the old ones are erased. What I mean is for the same instrument versionsvc_api_hit_count I only ever get the value 1.

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

I will investigate. Thank you for reporting this!

@jmacd jmacd self-assigned this Jun 30, 2020
@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

Can you double-check that you're running the 0.7 release? I ask because the push.Stateful() method has disappeared in 0.7. I suspect the "exporter disconnected" bit is related to the breaking OTLP change that hit us in this release cycle.

	pusher := push.New(
		simple.NewWithExactDistribution(),
		exporter,
		push.WithStateful(true),
		push.WithPeriod(2*time.Second),
	)

@zeyadkhaled
Copy link
Author

I can confirm that as of my previous comment about the exporter disconnected I am using 0.7 release and I did need to remove push.WithStateful() while writing the previous comment.

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2020

Would you state the OTel-Collector version that you tested with, just to help isolate this?

@zeyadkhaled
Copy link
Author

I tested this using otel/opentelemetry-collector:latest which happens to be v0.4.0

@jmacd
Copy link
Contributor

jmacd commented Jul 1, 2020

@zeyadkhaled I tried to reproduce your example, and got stuck at the gomigrate step. Where does this tool come from?

@zeyadkhaled
Copy link
Author

zeyadkhaled commented Jul 1, 2020 via email

@jmacd
Copy link
Contributor

jmacd commented Jul 1, 2020

Thanks. I un-stuck myself on that point, and now managed to get everything running, but no metrics are arriving and I'm trying to diagnose the problem. (This is a good exercise for me, so I'll just keep at it.)

@zeyadkhaled
Copy link
Author

I don't know if I should open a new issue for this or not. But I updated to the latest 0.5.0 version of collector and 0.7.0 of otel, and now in addition to the problem "Recorder and Observer values are missing" now the counter metrics are being overwritten with every new incoming value.

I created a branch for this issue to be able to reproduce it:

https://github.com/zeyadkhaled/openversion/tree/upgrade-collector-and-otel-versions

@jmacd
Copy link
Contributor

jmacd commented Jul 2, 2020

I don't think we need a new issue. There are now several well-known release coordination problems between the collector's OTLP protocol and the various language SDKs. I will work toward fixing this in the latest configuration, and thanks for the branch.

@jmacd
Copy link
Contributor

jmacd commented Jul 2, 2020

One thing I'm aware of, in this code, is the call to simple.NewWithExactDistribution() has to be replaced simple.NewWithInexpensiveDistribution() for the OTLP metric exporter. The OTLP exporter supports only Sum and MinMaxSumCount aggregation in the current release.

I apologize, but the OTLP protocol is really unstable and one of the issues is that we haven't sorted out how to represent raw data (which is implied by Exact aggregation). I'm sure that doesn't explain everything going on here, but I wanted to pass it along while I investigate.

@jmacd
Copy link
Contributor

jmacd commented Jul 2, 2020

I stumbled over a few more issues while debugging this, and I now think that the only necessary change in the client side is to switch to simple.NewWithInexpensiveDistribution. This encodes a SummaryDataPoint in OTLP which I see printed in the logging exporter on the collector side. (Related improvements: see open-telemetry/oteps#117 and open-telemetry/oteps#118).

Then in the collector, somehow in the translation from OTLP into OpenCensus data, which is used by the Prometheus exporters, the ValueRecorder data is dropped and the Counter metadata becomes "gauge", not "counter". I will have to dig in on the collector side now.

I do have a number of notes on the things that gave me trouble, I will batch those up and send when I'm able to get the end-to-end working.

@jmacd
Copy link
Contributor

jmacd commented Jul 2, 2020

I've reported an issue in the collector to hopefully reduce the number of people who stumble into this, while we fix it.

open-telemetry/opentelemetry-collector#1255

@zeyadkhaled
Copy link
Author

If you find an issue that might be a good first issue, I would more than love to help and contribute.

@zeyadkhaled
Copy link
Author

zeyadkhaled commented Jul 3, 2020

I tried to upgrade the OTEL version on the master of my repo, which uses for the Metrics pipeline a Prometheus receiver and Prometheus exporter and in the code uses a Prom Exporter and the OTLP exporter is just used for traces, and realized after sending a request the counter of api.hit for example starts from "0" not from 1 and starts doubling every 5 seconds almost and it infinitely keeps increasing.

I made a branch for this issue:
https://github.com/zeyadkhaled/openversion/tree/prom-exporter-otel0.7.0-issue

@jmacd
Copy link
Contributor

jmacd commented Jul 3, 2020

@zeyadkhaled Yes, you've discovered a duplicate of #887. I am investigating this.

@jmacd
Copy link
Contributor

jmacd commented Jul 16, 2020

This should be fixed in the 0.8 release.

@jmacd jmacd closed this as completed Jul 16, 2020
@jmacd
Copy link
Contributor

jmacd commented Jul 16, 2020

However note that the OTLP receiver is still not working in this case (open-telemetry/opentelemetry-collector#1255).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants