Metric queueSize twice in Prometheus output #4382

cbos · 2022-04-14T08:20:06Z

The prometheus endpoint produces invalid output.

The java application is started with:
OTEL_METRICS_EXPORTER=prometheus
OTEL_TRACES_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp

After a while the produced output contains this information

# .... lines above this removed ....
# TYPE queueSize gauge
# HELP queueSize The number of logs queued
queueSize{logProcessorType="BatchLogProcessor"} 0.0 1649922810502
# TYPE processedLogs_total counter
# HELP processedLogs_total The number of logs processed by the BatchLogProcessor. [dropped=true if they were dropped due to high throughput]
processedLogs_total{dropped="false",logProcessorType="BatchLogProcessor"} 11.0 1649922810502
# TYPE runtime_jvm_gc_count_total counter
# HELP runtime_jvm_gc_count_total The number of collections that have occurred for a given JVM garbage collector.
runtime_jvm_gc_count_total{gc="Copy"} 230.0 1649922810502
runtime_jvm_gc_count_total{gc="MarkSweepCompact"} 13.0 1649922810502
# TYPE runtime_jvm_gc_time_total counter
# HELP runtime_jvm_gc_time_total Time spent in a given JVM garbage collector in milliseconds.
runtime_jvm_gc_time_total{gc="Copy"} 4150.0 1649922810502
runtime_jvm_gc_time_total{gc="MarkSweepCompact"} 6399.0 1649922810502
# TYPE otlp_exporter_seen_total counter
# HELP otlp_exporter_seen_total 
otlp_exporter_seen_total{type="log"} 11.0 1649922810502
otlp_exporter_seen_total{type="span"} 9371.0 1649922810502
# TYPE otlp_exporter_exported_total counter
# HELP otlp_exporter_exported_total 
otlp_exporter_exported_total{success="true",type="log"} 11.0 1649922810502
otlp_exporter_exported_total{success="true",type="span"} 9371.0 1649922810502
# TYPE processedSpans_total counter
# HELP processedSpans_total The number of spans processed by the BatchSpanProcessor. [dropped=true if they were dropped due to high throughput]
processedSpans_total{dropped="false",spanProcessorType="BatchSpanProcessor"} 9371.0 1649922810502
# TYPE queueSize gauge
# HELP queueSize The number of spans queued
queueSize{spanProcessorType="BatchSpanProcessor"} 1.0 1649922810502

We read the prometheus endpoint with Telegraf as we get this error:

[inputs.prometheus] Error in plugin: error reading metrics for http://localhost:9088/metrics: reading text format failed: text format parsing error in line 115: second TYPE line for metric name "queueSize", or TYPE reported after samples

queueSize is metric is twice in the output, once for logs and once for spans.

This should be grouped together, like this:

# TYPE queueSize gauge
# HELP queueSize The number of logs queued
queueSize{logProcessorType="BatchLogProcessor"} 0.0 1649922810502
queueSize{spanProcessorType="BatchSpanProcessor"} 1.0 1649922810502

But it is now appearing as 2 different metrics, which is not valid.

The text was updated successfully, but these errors were encountered:

mateuszrzeszutek · 2022-04-14T10:26:14Z

I believe that's a problem with how BatchSpanProcessor and BatchLogProcessor are using metrics API (same instrument name, different description) - @anuraaga @jkwatson can you move this issue over to the SDK repo?

jkwatson · 2022-04-14T14:55:46Z

@jack-berg duplicate async callbacks here. What's the right solution to this?

jack-berg · 2022-04-14T15:38:23Z

@mateuszrzeszutek is sort of correct. BatchLogProcessor adds queueSize under io.opentelemetry.sdk.logs while BatchSpanProcessor adds queueSize under io.opentelemetry.sdk.traces. This is perfectly acceptable in the otel data model but presents problems in prometheus.

The spec is unclear how meter name / version manifest in the prometheus data model. The closed spec issue #2035 sheds some light on discussion that took place around this issue, but no definitive answer.

We could resolve this in the short term by treating the BatchLogProcessor and BatchSpanProcessor instruments as part of the same meter, using the same description, and adding an attribute for the type of data being processed.

This is an important issue to address in a general sense though: a client with two instrumented http clients recording http.client.duration would produce the same issue.

jack-berg · 2022-04-14T16:05:36Z

Can also get around this in the short term by configuring the view API to drop metrics named queueSize:

    SdkMeterProvider.builder()
        .registerView(
            InstrumentSelector.builder().setName("queueSize").build(),
            View.builder().setAggregation(Aggregation.drop()).build())

cbos added the Bug Something isn't working label Apr 14, 2022

jkwatson transferred this issue from open-telemetry/opentelemetry-java-instrumentation Apr 14, 2022

anuraaga mentioned this issue Apr 15, 2022

Ensure prometheus metrics with the same name are serialized as a group. #4386

Merged

jack-berg closed this as completed in #4386 Apr 18, 2022

cmunger mentioned this issue Dec 28, 2022

Metric queueSize twice fails with opentelemetry collector with a prometheus metrics exporter #5066

Closed

cmunger mentioned this issue Jan 31, 2023

Metric queueSize twice fails with opentelemetry collector with a prometheus metrics exporter open-telemetry/opentelemetry-collector-contrib#18194

Closed

puckpuck mentioned this issue Sep 16, 2023

Unify queueSize metric description and attribute #5836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric queueSize twice in Prometheus output #4382

Metric queueSize twice in Prometheus output #4382

cbos commented Apr 14, 2022 •

edited

Loading

mateuszrzeszutek commented Apr 14, 2022

jkwatson commented Apr 14, 2022

jack-berg commented Apr 14, 2022 •

edited

Loading

jack-berg commented Apr 14, 2022

Metric queueSize twice in Prometheus output #4382

Metric queueSize twice in Prometheus output #4382

Comments

cbos commented Apr 14, 2022 • edited Loading

mateuszrzeszutek commented Apr 14, 2022

jkwatson commented Apr 14, 2022

jack-berg commented Apr 14, 2022 • edited Loading

jack-berg commented Apr 14, 2022

cbos commented Apr 14, 2022 •

edited

Loading

jack-berg commented Apr 14, 2022 •

edited

Loading