Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otlpmetrichttp exporter is sending generic error msg instead of actual collector msg in case of using loadbalancer exporter #5536

Closed
Preeti-Dewani opened this issue Jun 24, 2024 · 1 comment · Fixed by #5541
Labels
bug Something isn't working
Milestone

Comments

@Preeti-Dewani
Copy link

Description

If user forget to add attribute service.name while creating resource object using resource.New and otlpmetrichttp with a collector setup of loadbalancing exporter (which uses service for routing purpose), error msg that user receives in scenario doesn't say anything about the actual problem, this is the actual error msg that user gets

failed to upload metrics: context deadline exceeded: retry-able request failure

this doesn't happen with otlpmetricgrpc exporter, it sends the actual error message that comes from collector, this is the error msg.

failed to upload metrics: context deadline exceeded: rpc error: code = Unavailable desc = unable to get service name

Issue: it is difficult to figure out the actual reason behind the problem and it is happening because underlying collector sends the actual reason in response body, which otlpmetrichttp discards in case of retryable code and actual issue behind the failure gets missed.

Part1: otlpmetrichttp -----------api call------------> otel-collector (loadbalancer exporter)

Part2: otel-collector --- sends error via---> otlpmetric UploadMetrics func

Part3: otlpmetric UploadMetrics func --- discards body ---> sends error retry-able request failure instead of actual error

Setup Details

Screenshot 2024-06-24 at 6 30 17 PM

Environment

  • OS: macOS
  • Architecture: arm64
  • Go Version: go1.21.11
  • opentelemetry-go version: 1.27.0

Steps To Reproduce

  1. create a sample golang application with these details
	options := []otlpmetrichttp.Option{
		otlpmetrichttp.WithEndpoint(*collectorEndpoint),
		otlpmetrichttp.WithURLPath(*collectorURL),
	}
	if !*isSecure {
		options = append(options, otlpmetrichttp.WithInsecure())
	}
	metricExporter, _ := otlpmetrichttp.New(ctx, options...)
	reader := metric.NewPeriodicReader(metricExporter, metric.WithInterval(*pushInterval))

	resourceConfig, _ := resource.New(ctx, resource.WithAttributes(
               attribute.String("service_name", "myapp"),
               attribute.String("job", "sample-job"), attribute.String("instance", "sample-instance")))
	meterProvider := metric.NewMeterProvider(
		metric.WithResource(resourceConfig),
		metric.WithReader(reader),
	)

  1. Export env variable

    export ENDPOINT_DOMAIN='otel-gateway:4317'
    export REMOTE_WRITE_URL='http://victoriametrics:8429/api/v1/write'

  2. Create golang app Dockerfile and build it under myapp in docker compose

docker-compose.yaml

version: '3.7'
services:
  myapp:
    build:
      context: .
      dockerfile: Dockerfile
    command:
      - "--endpointDomain=${ENDPOINT_DOMAIN}"
      - "--ingestPath="
      - "--isSecure=false"
    ports:
      - "8081:8081"

  otel-collector-1:
    container_name: otel-collector-1
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    environment:
      - REMOTE_WRITE_URL=${REMOTE_WRITE_URL}
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317"  # OTLP grpc receiver

  otel-collector-2:
    container_name: otel-collector-2
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    environment:
      - REMOTE_WRITE_URL=${REMOTE_WRITE_URL}
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317"   # OTLP grpc receiver

  otel-collector-3:
    container_name: otel-collector-3
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    environment:
      - REMOTE_WRITE_URL=${REMOTE_WRITE_URL}
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317"   # OTLP grpc receiver

  # Otel gateway (running loadbalacing exporter)
  otel-gateway:
    container_name: otel-gateway
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-gateway-config.yaml"]
    volumes:
      - ./otel-gateway-config.yaml:/etc/otel-gateway-config.yaml
    ports:
      - "4317:4317"        # OTLP http receiver
    depends_on:
      - otel-collector-1
      - otel-collector-2
      - otel-collector-3

  victoriametrics:
    container_name: victoriametrics
    image: victoriametrics/victoria-metrics
    ports:
      - "8439:8429"
    volumes:
      - victoriametricsdata:/victoriametricsdata
    command:
      - "-storageDataPath=/victoriametricsdata"
      - "-retentionPeriod=30"
      - "-httpListenAddr=:8429"
    restart: always

volumes:
  victoriametricsdata: { }

otel-gateway-config.yaml

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4317

processors:

exporters:
  debug:
    verbosity: detailed
  loadbalancing:
    protocol:
      otlp:
        timeout: 5s
        tls:
          insecure: true
    resolver:
      static:
        hostnames:
          - otel-collector-1:4317
          - otel-collector-2:4317
          - otel-collector-3:4317

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: []
      exporters: [loadbalancing]

otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:

exporters:
  debug:
    verbosity: detailed
  prometheusremotewrite: # the PRW exporter, to ingest metrics to backend
    endpoint: ${REMOTE_WRITE_URL}

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: []
      exporters: [prometheusremotewrite]

  1. docker compose up

  2. Make an api call to the myapp on http://localhost:8081

Expected behavior

Solution will send messages as it is received from collector.

failed to upload metrics:unable to get service name
@Preeti-Dewani Preeti-Dewani added the bug Something isn't working label Jun 24, 2024
@Preeti-Dewani
Copy link
Author

Preeti-Dewani commented Jun 24, 2024

PR for the fix: #5541

@MrAlias MrAlias added this to the v1.29.0 milestone Aug 7, 2024
mark-pictor-csec added a commit to mark-pictor-csec/opentelemetry-go that referenced this issue Oct 29, 2024
PR open-telemetry#5541 (and issue open-telemetry#5536) enhance error handling, returning body text as part
of the error. However, this is only done for retryable errors and focuses on a
load balancer configuration; error text still does not propagate to clients,
and any text for non-retryable errors is always discarded.

This PR adds handling of non-retryable errors, ensuring any body text is part
of the message returned to the user's code. There is no change to when errors
are reported, just an enhancement of the content of such an error.
mark-pictor-csec added a commit to mark-pictor-csec/opentelemetry-go that referenced this issue Oct 29, 2024
PR open-telemetry#5541 (and issue open-telemetry#5536) enhance error handling, returning body text
as part of the error. However, this is only done for retryable errors;
if non-retryable, error text still does not propagate to clients.

This PR adds handling of non-retryable errors, ensuring any body text is
part of the message returned to the user's code. There is no change to
the circumstances under which errors are reported, just an enhancement
of the content of such an error.
mark-pictor-csec added a commit to mark-pictor-csec/opentelemetry-go that referenced this issue Nov 6, 2024
PR open-telemetry#5541 (and issue open-telemetry#5536) enhance error handling, returning body text as part of the error. However, this is only done for retryable errors; if non-retryable, error text still does not propagate to clients.

This PR adds handling of non-retryable errors, ensuring any body text is part of the message returned to the user's code. There is no change to the circumstances under which errors are reported, just an enhancement of the content of such an error.
mark-pictor-csec added a commit to mark-pictor-csec/opentelemetry-go that referenced this issue Nov 6, 2024
PR open-telemetry#5541 (and issue open-telemetry#5536) enhance error handling, returning body text
as part of the error. However, this is only done for retryable errors;
if non-retryable, error text still does not propagate to clients.

This PR adds handling of non-retryable errors, ensuring any body text is
part of the message returned to the user's code. There is no change to
the circumstances under which errors are reported, just an enhancement
of the content of such an error.
mark-pictor-csec added a commit to mark-pictor-csec/opentelemetry-go that referenced this issue Nov 12, 2024
PR open-telemetry#5541 (and issue open-telemetry#5536) enhance error handling, returning body text
as part of the error. However, this is only done for retryable errors;
if non-retryable, error text still does not propagate to clients.

This PR adds handling of non-retryable errors, ensuring any body text is
part of the message returned to the user's code. There is no change to
the circumstances under which errors are reported, just an enhancement
of the content of such an error.
dmathieu added a commit that referenced this issue Nov 20, 2024
PR #5541 (and issue #5536) enhance error handling, returning body text
as part of the error. However, this is only done for retryable errors;
if non-retryable, error text still does not propagate to clients.

This PR adds handling of non-retryable errors, ensuring any body text is
part of the message returned to the user's code. There is no change to
the circumstances under which errors are reported, just an enhancement
of the content of such an error.

---------

Co-authored-by: Damien Mathieu <42@dmathieu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants