-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calls_total metrics number not matching with no of traces seen in Jaeger #33857
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hi, @samuelchrist, can you provide more context about your issue? Screenshots or more detailed content are welcome to help us understand your question. |
Hi @Frapschen , I am using otel-collector-agent (running as Daemonset on all nodes) and otel-collector-gateway ( running as deployment just 1 pod). The instrumentation of application(mainly java applications) is using open telemetry sdk 1.37.0. All the telemetry data sent from application are sent to otel-agent and forwarded to otel-collector-gateway using otlp. The data from otel-collector gateway is send to different backends like Prometheus for traces and jaeger for logs. Data is sent do Prometheus using Prometheus-exporter (not remote write). Prometheus scrapes data for every 15s. I am using spanmetrics connector. To get the metrics about the traces. I see the number of traces when directly checked in jaeger explore matching to what we see in Datadog dashboard (which will be removed) but calls_total metrics is producing lower counts which makes is not reliable. Below is the screenshot where I have queried the data for same time window, same service_name, url/endpoints etc(all tags same) but the total number of hits to the endpoints for same time range is low in calls_total compared to the count in jaeger. What i tried
nothing worked so far. Any input is much appreciated. |
Hi @Frapschen, Any inputs. Like what other configs change in Span metrics connector config can I try changing which you feel can help me debug the root cause? |
@samuelchrist try debugging by graphing |
@swar8080, Thanks for the input. I did try using even for larger time window |
@samuelchrist I suspect the If you could, maybe you can write the specific SQL or other query language to query the span storage directly. |
Hi @swar8080 and @Frapschen, Do we have any detailed document as to how spanmetrics works under the hood and what and how each parameter of span metrics connector work. I tried the spanmetrics readme but still not so clear. |
Hi @swar8080 and @Frapschen , Found another thing which might be causing the count difference. I have added url_path tag under dimension along with other dimensions. I noticed that the url_path which is for one client is showing for a different client. Not sure what is causing the issue. I went through the spanmetrics documentation. I am not able to find the root cause. Working on tight timeline. Any suggestion is highly appreciated |
I have verified its counting only the traces @Frapschen not spans from other traces The issue was labels are getting tagged in correctly. Meaning as mentioned above the url_path of client A is getting attached to client B. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Component(s)
connector/spanmetrics
What happened?
Description
In our current setup. we have otel-collector-agent running as Daemon Set and these Daemon set forward the traffic to otel-collector-gateway. Gateway would forward the traffic to Prometheus and jaeger.
I noticed the number of traces in the jaeger is consistently more compared to calls_total. I checked for
Steps to Reproduce
Expected Result
The calls_total counts to match with traces in jaeger counts
Actual Result
The calls_total counts does not match with the jaeger trace count
Collector version
0.100.0
Environment information
Environment
Running on k8s as pods
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: