`obsreport`'s TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

pmm-sumo · 2022-03-21T19:21:18Z

Is your feature request related to a problem? Please describe.

I was asked in which circumstances *_dropped_spans metric can be observed and while searching for the answer I noticed there are no references in code to obsreport/obsreport_processor.go's TracesDropped/MetricsDropped/LogsDropped functions (except for unit tests). They can be used to generate processor's _dropped_spans/_dropped_metric_points/_dropped_log_records metrics respectively.

We use the "Refused" counterparts but not the "Dropped" ones. Perhaps we should remove those?

Also, TracesAccepted and TracesRefused refer to spans rather than traces and MetricsAccepted/MetricsRefused refer to metric points.

Describe the solution you'd like

SpansDropped/MetricsDropped/LogsDropped removed
SpansAccepted/SpansRefused in obsreport rather than TracesAccepted/TracesRefused
(Perhaps) MetricsDataPointsAccepted/MetricsDataPointsRefused rather than MetricsAccepted/MetricsRefused

The text was updated successfully, but these errors were encountered:

jpkrohling · 2022-03-22T13:40:06Z

There used to be, we might have lost it. Seems related to #5038

codeboten · 2022-03-23T16:15:01Z

As discussed on Mar 23 - we should keep them and fix inconsistencies in them. It would be good to understand where/when their usage changed.

devrimdemiroz · 2022-09-14T08:41:01Z

Clarification on this would be excellent as we do not observe and presume according to #5038 that they will appear once such a situation happens.

jmacd · 2023-10-06T15:45:43Z

I came to this issue, realizing a related problem exists. When the exporterhelper code has queuing enabled and encounters a full queue, it will drop the data and log a message about it. It returns nil to the obsreport logic that counts send-failures, and consequently we are counting drops as successful sends.

Conceptually, however, it is an exporter, not a processor, and somehow it crosses a logical barrier for exporterhelper to be counting drops. The only way exporters currently have for counting is success/failure, dropping is done in processors.

I'm aware of ongoing work to migrate batchprocessor into exporterhelper, but note that batchprocessor never drops data.

Having recently deployed a trace sampler processor, this has become a serious matter. When a processor drops data, it can be bad or it can be good--it's not always a failure. When the sampler processor drops a span, it's a normal situation. When a memorylimiter processor drops a span, it's a failure. I am beginning think that given how non-functional the existing metrics are, we have a good opportunity to redesign them now.

Please bring your attention to open-telemetry/semantic-conventions#184, which is a discussion of how OTel will count success/failure/drops consistently across the entire telemetry pipeline. One thing that collector brings to this (larger) discussion worth mentioning is the notion of metrics level. My current thinking is that we should view "drops because of queue overflow" as a detailed form of "send failed", and we should only be able to distinguish local-queue-failure from remote-timeout and other retryable errors via metrics when level=detailed is set.

codeboten · 2023-10-06T15:46:22Z

This was the last change to reference the Traces* in a processor: ccee4eb#diff-e88817c5bceb5f48f1d38dfdc37163fd2a1745d496b1597bce17ef27e47fa183

jmacd · 2023-10-12T00:12:58Z

I was confused, in the comment above, by the issue identified here: #8674 (review). Exporters that use exporterhelper are counting enqueue_failed metrics.

danielgblanco · 2024-05-23T15:55:05Z

Am I right in thinking that this is still relevant? Looking at https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/processorhelper/obsreport.go it seems *Accepted and *Refused methods are all used in the memory limiter (and only there) but I don't see usages of *Dropped. Although I can see places in contrib that refer to dropped telemetry in different ways (none using those functions).

Probably related to #7460 as there may be places where telemetry is currently dropped (e.g. by the exporter after retries are exhausted) but it shouldn't.

Also related to the docs that mention these dropped metrics as a way to monitor data loss https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md#data-loss

If the collector aims to keep retrying (after #7460) can we assume then that the metric to monitor for data loss should be enqueue_failed (which already is a recommended) and refused (although the client will retry and it may not be actually data lost)? Should these methods still exist if telemetry is not meant to be dropped but rather "filtered" or unprocessable by certain processors?

TylerHelmuth · 2024-09-06T17:58:40Z

It feels like we should be recording a dropped metric here and/or here

jpkrohling self-assigned this Mar 24, 2022

dmitryax added the priority:p2 Medium label Mar 25, 2022

TylerHelmuth mentioned this issue Sep 6, 2024

[exporterhelper] Add exporter dropped metrics #11077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`obsreport`'s TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

`obsreport`'s TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

pmm-sumo commented Mar 21, 2022 •

edited

Loading

jpkrohling commented Mar 22, 2022

codeboten commented Mar 23, 2022

devrimdemiroz commented Sep 14, 2022

jmacd commented Oct 6, 2023

codeboten commented Oct 6, 2023 •

edited

Loading

jmacd commented Oct 12, 2023

danielgblanco commented May 23, 2024

TylerHelmuth commented Sep 6, 2024

obsreport's TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

obsreport's TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

Comments

pmm-sumo commented Mar 21, 2022 • edited Loading

jpkrohling commented Mar 22, 2022

codeboten commented Mar 23, 2022

devrimdemiroz commented Sep 14, 2022

jmacd commented Oct 6, 2023

codeboten commented Oct 6, 2023 • edited Loading

jmacd commented Oct 12, 2023

danielgblanco commented May 23, 2024

TylerHelmuth commented Sep 6, 2024

`obsreport`'s TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

`obsreport`'s TracesDropped/MetricsDropped/LogsDropped are not actually used #5056

pmm-sumo commented Mar 21, 2022 •

edited

Loading

codeboten commented Oct 6, 2023 •

edited

Loading