Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/filter] Add metrics for dropped telemetry #13169

Closed
ajsaclayan opened this issue Aug 10, 2022 · 11 comments
Closed

[processor/filter] Add metrics for dropped telemetry #13169

ajsaclayan opened this issue Aug 10, 2022 · 11 comments
Assignees

Comments

@ajsaclayan
Copy link
Contributor

ajsaclayan commented Aug 10, 2022

Is your feature request related to a problem? Please describe.
We have a dashboard that shows the success rate of data flowing through a collector pipeline. We use four metrics to determine success rate:

  1. Receiver accepted data point
  2. Receiver refused data point
  3. Processor dropped data point
  4. Exporter sent data point

We have a filter processor configured for telemetry we don't care about. When telemetry gets dropped from a filter processor, we have no metrics to identify that they were dropped. Since they were dropped, our success rate calculation is lower than expected.

Describe the solution you'd like
As a developer, it would be nice to know how many traces/spans/logs are getting dropped by a filter processor.

Describe alternatives you've considered
N/A

Additional context
N/A

@mx-psi mx-psi added enhancement New feature or request processor/filter Filter processor labels Aug 11, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @boostchicken @pmm-sumo. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax dmitryax added the priority:p3 Lowest label Aug 12, 2022
@dmitryax
Copy link
Member

We are moving towards using OpenTelemetry SDK for reporting Collector metrics. I'm not sure if we want to add more metrics reported by OpenCensus it this point. Maybe we can wait for adoption of OTel SDK before tackling this issue, but It's not clear how long it'll take.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@ajsaclayan
Copy link
Contributor Author

ajsaclayan commented Nov 18, 2022

Bumping, can this be reconsidered?

@github-actions github-actions bot removed the Stale label May 27, 2023
@danelson
Copy link
Contributor

danelson commented Jun 1, 2023

@dmitryax In lieu of this I am curious if you any ideas how we can accurately determine success rate. We calculate this by doing

$$\frac{otelcol\_exporter\_sent_{signal}}{otelcol\_receiver\_accepted_{signal}+otelcol\_receiver\_refused_{signal}}$$

after we add the filter processor we need the following but it does not seem possible

$$\frac{otelcol\_exporter\_sent_{signal}}{otelcol\_receiver\_accepted_{signal}+otelcol\_receiver\_refused_{signal}-otelcol\_processor\_filtered_{signal}}$$

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

I think adding some metrics for the processor when it drops a span is a good idea. We (Honeycomb) include similar metrics for Refinery and they are very useful.

@TylerHelmuth TylerHelmuth added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • issue: Github issue template generation code needs this to generate the corresponding labels.
  • processor/filter: @TylerHelmuth @boostchicken

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 2, 2023
@TylerHelmuth TylerHelmuth removed the Stale label Oct 2, 2023
@TylerHelmuth TylerHelmuth added Contribfest and removed help wanted Extra attention is needed good first issue Good for newcomers labels Oct 23, 2023
@MacroPower
Copy link
Contributor

I'm interested in working on this 🙂

@codeboten
Copy link
Contributor

Thanks @MacroPower! Assigned

TylerHelmuth added a commit that referenced this issue Dec 19, 2023
…#29081)

**Description:**

Adds telemetry for metrics, logs, and spans that were intentionally
dropped via a `filterprocessor`. Specifically, the following metrics are
added:

`otelcol_processor_filter_datapoints_filtered`
`otelcol_processor_filter_logs_filtered`
`otelcol_processor_filter_spans_filtered`

Please let me know any feedback/thoughts on the naming or anything else!

**Link to tracking Issue:** #13169

**Testing:** I've used batchprocessor as an example for a couple of
tests, Filter*ProcessorTelemetryWithOC. I kept the wrapping code so that
OTEL versions can be easily added when that is ready in contrib. Overall
the tests are not super comprehensive and I could improve them if
needed, but as-is they were helpful for debugging.

<details>
<summary><i>Additionally, here's some stuff you can use for manually
testing.</i></summary>

There might be a better way to do this, but I just used hostmetrics,
filelog, and [this article from
honeycomb](https://www.honeycomb.io/blog/test-span-opentelemetry-collector)
with otlp/http.

Note, this should be run from the root of the contrib repo.

Add/overwrite `local/config.yaml`, `local/span.json`, and run:

```bash
mkdir -p local

cat >local/config.yaml <<EOL
receivers:
  hostmetrics:
    collection_interval: 30s
    initial_delay: 1s
    scrapers:
      load:
  filelog:
    include:
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log
      - /tmp/otel-test.log
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: "%Y-%m-%d %H:%M:%S"
  otlp:
    protocols:
      ## curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json
      http:

processors:
  filter/test:
    metrics:
      metric:
        # Should drop 2 of the 3 metrics, 5m average remains
        - 'name=="system.cpu.load_average.1m"'
        - 'name=="system.cpu.load_average.15m"'
    logs:
      log_record:
        # Should filter out "bar" and "baz"
        - 'IsMatch(body, ".*ba.*")'
    traces:
      span:
        # Should drop 1 of the 2 spans
        - 'name == "foobar"'

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 200

service:
  extensions: []
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [filter/test]
      exporters: [debug]
    logs:
      receivers: [filelog]
      processors: [filter/test]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [filter/test]
      exporters: [debug]

  telemetry:
    logs:
      level: debug
    metrics:
      level: detailed
      address: 0.0.0.0:8888
EOL

cat >local/span.json <<EOL
{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "test-with-curl"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {
            "name": "manual-test"
          },
          "spans": [
            {
              "traceId": "71699b6fe85982c7c8995ea3d9c95df2",
              "spanId": "3c191d03fa8be065",
              "name": "spanitron",
              "kind": 2,
              "droppedAttributesCount": 0,
              "events": [],
              "droppedEventsCount": 0,
              "status": {
                "code": 1
              }
            },
            {
              "traceId": "71699b6fe85982c7c8995ea3d9c95df2",
              "spanId": "2f357b34d32f77b4",
              "name": "foobar",
              "kind": 2,
              "droppedAttributesCount": 0,
              "events": [],
              "droppedEventsCount": 0,
              "status": {
                "code": 1
              }
            }
          ]
        }
      ]
    }
  ]
}
EOL

make run
```

Send some data to the receivers:

```bash
# Write some logs
echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log
echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log
echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log

# Write some spans
curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json
```

Check the results:

```console
$ curl http://localhost:8888/metrics | grep filtered
# HELP otelcol_processor_filter_datapoints_filtered Number of metric data points dropped by the filter processor
# TYPE otelcol_processor_filter_datapoints_filtered counter
otelcol_processor_filter_datapoints_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2
# HELP otelcol_processor_filter_logs_filtered Number of logs dropped by the filter processor
# TYPE otelcol_processor_filter_logs_filtered counter
otelcol_processor_filter_logs_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2
# HELP otelcol_processor_filter_spans_filtered Number of spans dropped by the filter processor
# TYPE otelcol_processor_filter_spans_filtered counter
otelcol_processor_filter_spans_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 1
```

</details>

**Documentation:** I do not believe we document telemetry exposed by
components, but I could add this if needed.

---------

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Copy link
Contributor

github-actions bot commented Jan 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 8, 2024
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
…open-telemetry#29081)

**Description:**

Adds telemetry for metrics, logs, and spans that were intentionally
dropped via a `filterprocessor`. Specifically, the following metrics are
added:

`otelcol_processor_filter_datapoints_filtered`
`otelcol_processor_filter_logs_filtered`
`otelcol_processor_filter_spans_filtered`

Please let me know any feedback/thoughts on the naming or anything else!

**Link to tracking Issue:** open-telemetry#13169

**Testing:** I've used batchprocessor as an example for a couple of
tests, Filter*ProcessorTelemetryWithOC. I kept the wrapping code so that
OTEL versions can be easily added when that is ready in contrib. Overall
the tests are not super comprehensive and I could improve them if
needed, but as-is they were helpful for debugging.

<details>
<summary><i>Additionally, here's some stuff you can use for manually
testing.</i></summary>

There might be a better way to do this, but I just used hostmetrics,
filelog, and [this article from
honeycomb](https://www.honeycomb.io/blog/test-span-opentelemetry-collector)
with otlp/http.

Note, this should be run from the root of the contrib repo.

Add/overwrite `local/config.yaml`, `local/span.json`, and run:

```bash
mkdir -p local

cat >local/config.yaml <<EOL
receivers:
  hostmetrics:
    collection_interval: 30s
    initial_delay: 1s
    scrapers:
      load:
  filelog:
    include:
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log
      ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log
      - /tmp/otel-test.log
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: "%Y-%m-%d %H:%M:%S"
  otlp:
    protocols:
      ## curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json
      http:

processors:
  filter/test:
    metrics:
      metric:
        # Should drop 2 of the 3 metrics, 5m average remains
        - 'name=="system.cpu.load_average.1m"'
        - 'name=="system.cpu.load_average.15m"'
    logs:
      log_record:
        # Should filter out "bar" and "baz"
        - 'IsMatch(body, ".*ba.*")'
    traces:
      span:
        # Should drop 1 of the 2 spans
        - 'name == "foobar"'

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 200

service:
  extensions: []
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [filter/test]
      exporters: [debug]
    logs:
      receivers: [filelog]
      processors: [filter/test]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [filter/test]
      exporters: [debug]

  telemetry:
    logs:
      level: debug
    metrics:
      level: detailed
      address: 0.0.0.0:8888
EOL

cat >local/span.json <<EOL
{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "test-with-curl"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {
            "name": "manual-test"
          },
          "spans": [
            {
              "traceId": "71699b6fe85982c7c8995ea3d9c95df2",
              "spanId": "3c191d03fa8be065",
              "name": "spanitron",
              "kind": 2,
              "droppedAttributesCount": 0,
              "events": [],
              "droppedEventsCount": 0,
              "status": {
                "code": 1
              }
            },
            {
              "traceId": "71699b6fe85982c7c8995ea3d9c95df2",
              "spanId": "2f357b34d32f77b4",
              "name": "foobar",
              "kind": 2,
              "droppedAttributesCount": 0,
              "events": [],
              "droppedEventsCount": 0,
              "status": {
                "code": 1
              }
            }
          ]
        }
      ]
    }
  ]
}
EOL

make run
```

Send some data to the receivers:

```bash
# Write some logs
echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log
echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log
echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log

# Write some spans
curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json
```

Check the results:

```console
$ curl http://localhost:8888/metrics | grep filtered
# HELP otelcol_processor_filter_datapoints_filtered Number of metric data points dropped by the filter processor
# TYPE otelcol_processor_filter_datapoints_filtered counter
otelcol_processor_filter_datapoints_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2
# HELP otelcol_processor_filter_logs_filtered Number of logs dropped by the filter processor
# TYPE otelcol_processor_filter_logs_filtered counter
otelcol_processor_filter_logs_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2
# HELP otelcol_processor_filter_spans_filtered Number of spans dropped by the filter processor
# TYPE otelcol_processor_filter_spans_filtered counter
otelcol_processor_filter_spans_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 1
```

</details>

**Documentation:** I do not believe we document telemetry exposed by
components, but I could add this if needed.

---------

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants