[Connector/Servicegraph] Servicegraph Connector are not giving correct metrics of spans #34170

VijayPatil872 · 2024-07-19T12:43:12Z

Component(s)

connector/servicegraph

What happened?

Description

I am using servicegraph connector to generate service graph and metrics from span. the metrics are emitted by the connector are fluctuating up and down.
We are using service graphs connector to build service graph. We have deployed a layer of Collectors containing the load-balancing exporter in front of traces Collectors doing the span metrics and service graph connector processing. The load-balancing exporter is used to hash the trace ID consistently and determine which collector backend should receive spans for that trace.
the service graph exporting the metrics to Grafana mimir with prometheusremotewrite exporter.

Steps to Reproduce

Expected Result

The metrics are emitted by the connector should be correct

Actual Result

Collector version

0.104.0

Environment information

No response

OpenTelemetry Collector configuration

config:        
  exporters:


    prometheusremotewrite/mimir-default-processor-spanmetrics:
      endpoint: 
      headers:
        x-scope-orgid: ********
      resource_to_telemetry_conversion:
        enabled: true
      timeout: 30s
      tls:
        insecure: true
      remote_write_queue:
        enabled: true
        queue_size: 100000
        num_consumers: 500        

    prometheusremotewrite/mimir-default-servicegraph:
      endpoint: 
      headers:
        x-scope-orgid: **********
      resource_to_telemetry_conversion:
        enabled: true
      timeout: 30s  
      tls:
        insecure: true
      remote_write_queue:
        enabled: true
        queue_size: 100000
        num_consumers: 500


  connectors:
    spanmetrics:
      histogram:
        explicit:
          buckets: [100ms, 500ms, 2s, 5s, 10s, 20s, 30s]
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
      metrics_flush_interval: 15s
      metrics_expiration: 5m
      exemplars:
        enabled: false
      dimensions:
        - name: http.method
        - name: http.status_code
        - name: cluster
        - name: collector.hostname
      events:
        enabled: true
        dimensions:
          - name: exception.type
      resource_metrics_key_attributes:
        - service.name
        - telemetry.sdk.language
        - telemetry.sdk.name
    servicegraph:
      latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
      store:
        ttl: 2s
        max_items: 10

  receivers:
    otlp:
      protocols:
        http:
          endpoint: ${env:MY_POD_IP}:*****
        grpc:
          endpoint: ${env:MY_POD_IP}:*****
  service:


    pipelines:
      traces/connector-pipeline:
        exporters:
          - otlphttp/tempo-processor-default
          - spanmetrics
          - servicegraph
        processors:
          - batch          
          - memory_limiter
        receivers:
          - otlp
     
      metrics/spanmetrics:
        exporters:
          - debug
          - prometheusremotewrite/mimir-default-processor-spanmetrics
        processors:
          - batch          
          - memory_limiter
        receivers:
          - spanmetrics

      metrics/servicegraph:
        exporters:
          - debug
          - prometheusremotewrite/mimir-default-servicegraph
        processors:
          - batch          
          - memory_limiter
        receivers:
          - servicegraph

Log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-07-19T12:43:28Z

Pinging code owners:

connector/servicegraph: @jpkrohling @mapno @JaredTan95

See Adding Labels via Comments if you do not have permissions to add labels yourself.

VijayPatil872 · 2024-09-27T08:13:44Z

any update on the issue?

mapno · 2024-10-03T10:40:44Z

Can you provide more information on why metrics are incorrect? A test or test data that reproduces the behaviour would be very helpful

VijayPatil872 · 2024-10-10T12:44:52Z

@mapno If we consider traces_service_graph_request_total metrics or traces_service_graph_request_failed_total metrics, these should be counter, but it is seen fluctuating up and down.
similarly for calls_total metrics in case of spanmetrics it should be a counter, but the graph is up & down at sometimes.
Also Can you explain for me what kind of A test or test data you need as the configurations as applied above. Let me know for addition details required.

github-actions · 2024-12-12T03:39:11Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

connector/servicegraph: @mapno @JaredTan95

See Adding Labels via Comments if you do not have permissions to add labels yourself.

VijayPatil872 added bug Something isn't working needs triage New item requiring triage labels Jul 19, 2024

github-actions bot added the connector/servicegraph label Jul 19, 2024

github-actions bot mentioned this issue Jul 23, 2024

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Closed

This was referenced Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Closed

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Closed

This was referenced Aug 13, 2024

Weekly Report: 2024-08-06 - 2024-08-13 #34626

Closed

Weekly Report: 2024-08-13 - 2024-08-20 #34743

Closed

This was referenced Aug 27, 2024

Weekly Report: 2024-08-20 - 2024-08-27 #34856

Closed

Weekly Report: 2024-08-27 - 2024-09-03 #34966

Closed

This was referenced Sep 10, 2024

Weekly Report: 2024-09-03 - 2024-09-10 #35086

Closed

Weekly Report: 2024-09-10 - 2024-09-17 #35228

Closed

github-actions bot mentioned this issue Sep 24, 2024

Weekly Report: 2024-09-17 - 2024-09-24 #35377

Closed

github-actions bot mentioned this issue Oct 1, 2024

Weekly Report: 2024-09-24 - 2024-10-01 #35498

Closed

github-actions bot mentioned this issue Oct 8, 2024

Weekly Report: 2024-10-01 - 2024-10-08 #35659

Closed

atoulme removed the needs triage New item requiring triage label Oct 12, 2024

github-actions bot added the Stale label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Connector/Servicegraph] Servicegraph Connector are not giving correct metrics of spans #34170

[Connector/Servicegraph] Servicegraph Connector are not giving correct metrics of spans #34170

VijayPatil872 commented Jul 19, 2024 •

edited

Loading

github-actions bot commented Jul 19, 2024

VijayPatil872 commented Sep 27, 2024

mapno commented Oct 3, 2024

VijayPatil872 commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Dec 12, 2024

[Connector/Servicegraph] Servicegraph Connector are not giving correct metrics of spans #34170

[Connector/Servicegraph] Servicegraph Connector are not giving correct metrics of spans #34170

Comments

VijayPatil872 commented Jul 19, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jul 19, 2024

VijayPatil872 commented Sep 27, 2024

mapno commented Oct 3, 2024

VijayPatil872 commented Oct 10, 2024 • edited Loading

github-actions bot commented Dec 12, 2024

VijayPatil872 commented Jul 19, 2024 •

edited

Loading

VijayPatil872 commented Oct 10, 2024 •

edited

Loading