-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
obsreport
's TracesDropped/MetricsDropped/LogsDropped are not actually used
#5056
Comments
There used to be, we might have lost it. Seems related to #5038 |
As discussed on Mar 23 - we should keep them and fix inconsistencies in them. It would be good to understand where/when their usage changed. |
Clarification on this would be excellent as we do not observe and presume according to #5038 that they will appear once such a situation happens. |
I came to this issue, realizing a related problem exists. When the Conceptually, however, it is an exporter, not a processor, and somehow it crosses a logical barrier for exporterhelper to be counting drops. The only way exporters currently have for counting is success/failure, dropping is done in processors. I'm aware of ongoing work to migrate batchprocessor into exporterhelper, but note that batchprocessor never drops data. Having recently deployed a trace sampler processor, this has become a serious matter. When a processor drops data, it can be bad or it can be good--it's not always a failure. When the sampler processor drops a span, it's a normal situation. When a memorylimiter processor drops a span, it's a failure. I am beginning think that given how non-functional the existing metrics are, we have a good opportunity to redesign them now. Please bring your attention to open-telemetry/semantic-conventions#184, which is a discussion of how OTel will count success/failure/drops consistently across the entire telemetry pipeline. One thing that collector brings to this (larger) discussion worth mentioning is the notion of metrics level. My current thinking is that we should view "drops because of queue overflow" as a detailed form of "send failed", and we should only be able to distinguish local-queue-failure from remote-timeout and other retryable errors via metrics when |
This was the last change to reference the Traces* in a processor: ccee4eb#diff-e88817c5bceb5f48f1d38dfdc37163fd2a1745d496b1597bce17ef27e47fa183 |
I was confused, in the comment above, by the issue identified here: #8674 (review). Exporters that use exporterhelper are counting |
Am I right in thinking that this is still relevant? Looking at https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/processorhelper/obsreport.go it seems Probably related to #7460 as there may be places where telemetry is currently dropped (e.g. by the exporter after retries are exhausted) but it shouldn't. Also related to the docs that mention these If the collector aims to keep retrying (after #7460) can we assume then that the metric to monitor for data loss should be |
Is your feature request related to a problem? Please describe.
I was asked in which circumstances
*_dropped_spans
metric can be observed and while searching for the answer I noticed there are no references in code to obsreport/obsreport_processor.go'sTracesDropped
/MetricsDropped
/LogsDropped
functions (except for unit tests). They can be used to generate processor's_dropped_spans
/_dropped_metric_points
/_dropped_log_records
metrics respectively.We use the "Refused" counterparts but not the "Dropped" ones. Perhaps we should remove those?
Also,
TracesAccepted
andTracesRefused
refer to spans rather than traces andMetricsAccepted
/MetricsRefused
refer to metric points.Describe the solution you'd like
SpansDropped
/MetricsDropped
/LogsDropped
removedSpansAccepted
/SpansRefused
inobsreport
rather thanTracesAccepted
/TracesRefused
MetricsDataPointsAccepted
/MetricsDataPointsRefused
rather thanMetricsAccepted
/MetricsRefused
The text was updated successfully, but these errors were encountered: