-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/spanmetrics] - Spanmetrics connector is not producing correct metrics #32043
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Can you give an example to explain which part of the connctor you think is not correct? the |
@ramanjaneyagupta I would expect the counters to grow from collector start (with AGGREGATION_TEMPORALITY_CUMULATIVE), this includes the histogram buckets https://prometheus.io/docs/concepts/metric_types/#histogram. If you still believe the produced series to be incorrect, please share more example data. |
Hi @portertech and @Frapschen attached are the screenshots seems this AGGREGATION_TEMPORALITY_CUMULATIVE have some issue. My application metrics are showing different numbers (mostly correct as it is matching my test results) but span metrics are showing different numbers for same http calls. I would have get more details but at this point not sure whatelse it needs to debug/verify. please let me know any specific things that you are expecting i will try to get these. thanks! |
@ramanjaneyagupta The query results of Span Metrics are very strange, I think calls_total metrics will never down forward, can you query the calls_total metrics with out any functions? |
One guess here is that the prometheus is scraping each of your collectors - but the collectors are reporting the same series, so prometheus jumps around between them. I think 32042 has a similar issue (except that one is a remote write so the symptoms are slightly different). A clue is the shape of the graphs: Here's an illustration of what i mean: I've circled each disjoint section of the series. Each color represents a different collector that prometheus is scraping. The series starts off with a scrape at the red collector. Then, it switches to the orange collector. Then, it scrapes the red collector again. Then, it goes to the yellow collector etc. There's a similar pattern in your other graph. Once again - this is just a guess. If you can confirm that each of these collectors are exporting the same series with the exact same labels, that would probably confirm this theory. |
Hi we are running otel collectors as a gateway .. all agent collectors and deployments running on the VMs and k8s will send the data to this central gateway (set of collectors) in the gateway we are calculating span metrics and in another layer applying for tail sampling before sending the data to our storage. so yes in the gateway server1 and server 2 may receive part of similar data from different instances or in different time stamps. so if I am running as a gateway is there any better way to calculate these metrics ? |
I don't know if this is the right venue for this discussion - this is turning into more of a question on how to deploy the collector / how to make this work with your datastore rather than a bug report in the collector codebase. The CNCF slack might be a better venue for such a discussion. That being said - you may find the recommendations in https://opentelemetry.io/docs/collector/scaling/#how-to-scale helpful - there are several recommendations there for workloads similar to yours. |
But it is related to span metrics and it is clearly showing in gateway mode it is not calculating metrics properly. Or atleast i think, it needs a better documentation to configure spanmetrics when it runs in Gateway mode. As configuring the spanmetrics with current documentation is not working properly when running in Gateway mode. |
Agreed that better docs would be helpful here! @ramanjaneyagupta since this is a more general issue, not only pertaining to the spanmetrics connector but to any other component that produces telemetry without necessarily handling tags, I have filed open-telemetry/opentelemetry.io/issues/4368 and I am going to close this issue in favor of that one. We can work with the documentation team to improve the Collector documentation about this. |
HI @mx-psi and @ankitpatel96 sorry to reopen the issue here - as i am thinking some problem with span metrics I tried couple of ways :
I am seeing correct results with Tempo Metrics Generator but not with Span Metrics Connector. |
Same issue. when i use tempo's metrics generator, metrics were correct. but when i use spanmetrics connector at layer-2 otel collector, metrics were something strange. |
Hi @mx-psi, @ankitpatel96 any update on this issue? |
I have no issue now. see below. |
Component(s)
connector/spanmetrics
What happened?
Description
Setup: Agents -> Gateway(OtelCollectors) -> Storage.
Gateway contains multiple servers which calculates the spanmetrics and exports to Prometheus.
Steps to Reproduce
Setup spanmetrics connector, Prometheus Exporter
Expected Result
Spanmetrics should calculate metrics Properly (total_calls, counts, histograms etc.. )
Actual Result
Spanmetrics are not producing correct results. (some of the metrics are in keep increasing)
Collector version
v0.96.0
Environment information
Environment
OS: (Linux (RHEL)
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: