High cardinality http_server_* metrics from `otelcol.receiver.zipkin` #4764

glindstedt · 2023-08-09T08:44:55Z

What's wrong?

We've recently migrated to agent flow mode and in that process migrated to a setup with a single dedicated agent for ingesting traces. We ingest traces both via OTLP grpc and zipkin (and sometimes jaeger). We noticed that the scrape job to scrape metrics from the dedicated traces agent started becoming heavy, and found that it was due to some very high cardinality metrics exported by the zipkin receiver.

Example:

http_server_duration_bucket{component_id="otelcol.receiver.zipkin.default",http_client_ip="10.129.128.11",http_flavor="1.1",http_method="POST",http_scheme="http",http_status_code="202",net_host_name="zipkin.monitoring.svc.cluster.local",net_sock_peer_addr="127.0.0.6",net_sock_peer_port="33389",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.42.0",le="0"} 0

Notice the http_client_ip and net_sock_peer_port labels, which quickly explode. This seems to be due to this upstream issue: open-telemetry/opentelemetry-go-contrib#3765

Even though we've configured our scrape job to drop all http_server_* metrics, the act of just parsing the /metrics endpoint gradually becomes unmanageable as it grows asymptotically. Just now I tested with curl and found it to be 232Mb on our traces agent.

I'm opening this issue with the hope that a workaround can be implemented in the grafana-agent until this issue has been fixed upstream.

Steps to reproduce

Run the agent in flow mode with an otelcol.receiver.zipkin component and start ingesting zipkin traces, and see the http_server_* metrics exposed by the agent explode in cardinality.

System information

Linux x86; GKE 1.24

Software version

Grafana Agent v0.35.0

Configuration

otelcol.receiver.zipkin "default" {
	output {
		metrics = [otelcol.processor.memory_limiter.default.input]
		logs = [otelcol.processor.memory_limiter.default.input]
		traces = [otelcol.processor.memory_limiter.default.input]
	}
}

Logs

No response

The text was updated successfully, but these errors were encountered:

glindstedt · 2023-08-09T09:14:31Z

It seems like the otlp receiver metrics also have the problematic labels, however for some reason they don't seem to explode as much, the net_sock_peer_port value seems to be more stable

rpc_server_duration_milliseconds_bucket{component_id="otelcol.receiver.otlp.default",net_sock_peer_addr="127.0.0.6",net_sock_peer_port="38107",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc",otel_scope_version="0.42.0",rpc_grpc_status_code="0",rpc_method="Export",rpc_service="opentelemetry.proto.collector.trace.v1.TraceService",rpc_system="grpc",le="0"} 962

glindstedt added the bug Something isn't working label Aug 9, 2023

grafanabot added this to Grafana Agent (Public) Aug 9, 2023

github-project-automation bot moved this to Todo in Grafana Agent (Public) Aug 9, 2023

glindstedt mentioned this issue Aug 9, 2023

Work around high cardinality metrics from otelcol receivers #4769

Merged

4 tasks

rfratto added type/signals enhancement New feature or request and removed bug Something isn't working labels Aug 15, 2023

ptodev closed this as completed in #4769 Sep 1, 2023

github-project-automation bot moved this from Todo to Done in Grafana Agent (Public) Sep 1, 2023

ese mentioned this issue Sep 12, 2023

Metrics explosion since v0.35 for tracing collector component #5155

Closed

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024

github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cardinality http_server_* metrics from `otelcol.receiver.zipkin` #4764

High cardinality http_server_* metrics from `otelcol.receiver.zipkin` #4764

glindstedt commented Aug 9, 2023

glindstedt commented Aug 9, 2023

High cardinality http_server_* metrics from otelcol.receiver.zipkin #4764

High cardinality http_server_* metrics from otelcol.receiver.zipkin #4764

Comments

glindstedt commented Aug 9, 2023

What's wrong?

Steps to reproduce

System information

Software version

Configuration

Logs

glindstedt commented Aug 9, 2023

High cardinality http_server_* metrics from `otelcol.receiver.zipkin` #4764

High cardinality http_server_* metrics from `otelcol.receiver.zipkin` #4764