Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geoip processor: dropping logs when IP not found #35047

Closed
grioda01 opened this issue Sep 6, 2024 · 10 comments
Closed

Geoip processor: dropping logs when IP not found #35047

grioda01 opened this issue Sep 6, 2024 · 10 comments
Labels
bug Something isn't working processor/geoip

Comments

@grioda01
Copy link

grioda01 commented Sep 6, 2024

Component(s)

processor/geoip

What happened?

Description

Apparently, when the source.address is present and contains an IP that is not in the database (such as an internal network IP) an error is issued saying that log entry is dropped

Steps to Reproduce

Kubernetes operator installation with collector in daemonset mode, receiver is filelog and processor is Geoip in alpha.

Expected Result

No geo location field should be populated and the log entry should not be dropped

Actual Result

Log entry is dropped and an error is issued in the opentelemetry collector pod log
2024-09-06T08:34:02.523Z error consumerretry/logs.go:87 Max elapsed time expired. Dropping data. {"kind": "receiver", "name": "filelog", "data_type": "logs", "error": "no geo IP metadata found", "dropped_items": 100}
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/consumerretry.(*logsConsumer).ConsumeLogs
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal@v0.108.0/consumerretry/logs.go:87
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.108.0/adapter/receiver.go:126

Collector version

v0.108.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

exporters:
      debug:
        verbosity: detailed
      elasticsearch:
        auth:
          authenticator: basicauth
        endpoint: https://xxxx
        logs_index: xxxxx
        mapping:
          mode: raw
    extensions:
      basicauth:
        client_auth:
          password: xxxx
          username: xxxx
    processors:
      batch: {}
      geoip:
        providers:
          maxmind:
            database_path: /tmp/geoipdb/GeoLite2-City.mmdb
      k8sattributes:
        extract:
          labels:
          - from: node
            key: kubernetes.azure.com/cluster
            tag_name: k8s.cluster.name
          metadata:
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.deployment.name
          - k8s.namespace.name
          - k8s.node.name
          - k8s.pod.start_time
          - k8s.cluster.uid
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.name
          - from: resource_attribute
            name: k8s.namespace.name
          - from: resource_attribute
            name: k8s.pod.uid
      transform/geoip_after:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - set(attributes["client.geo.city_name"], resource.attributes["geo.city_name"])
          - set(attributes["client.geo.postal_code"], resource.attributes["geo.postal_code"])
          - set(attributes["client.geo.country_name"], resource.attributes["geo.country_name"])
          - set(attributes["client.geo.country_iso_code"], resource.attributes["geo.country_iso_code"])
          - set(attributes["client.geo.location"], Concat([resource.attributes["geo.location.lat"],
            resource.attributes["geo.location.lon"]], ",")) where resource.attributes["geo.location.lat"]
            != nil
      transform/geoip_before:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - set(resource.attributes["source.address"], attributes["client"]["ip"])
            where IsMap(attributes["client"]) == true
          - set(resource.attributes["source.address"], attributes["client.ip"]) where
            attributes["client.ip"] != nil
      transform/geoip_del:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - delete_key(resource.attributes, "geo.city_name")
          - delete_key(resource.attributes, "geo.continent_name")
          - delete_key(resource.attributes, "geo.region_iso_code")
          - delete_key(resource.attributes, "geo.country_iso_code")
          - delete_key(resource.attributes, "geo.timezone")
          - delete_key(resource.attributes, "geo.country_name")
          - delete_key(resource.attributes, "geo.continent_code")
          - delete_key(resource.attributes, "geo.location")
          - delete_key(resource.attributes, "geo.region_name")
          - delete_key(resource.attributes, "geo.postal_code")
          - delete_key(resource.attributes, "source.address")
      transform/k8s_del:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - delete_key(resource.attributes, "k8s.namespace.name")
          - delete_key(resource.attributes, "k8s.pod.name")
          - delete_key(resource.attributes, "k8s.pod.start_time")
          - delete_key(resource.attributes, "k8s.cluster.name")
          - delete_key(resource.attributes, "k8s.cluster.uid")
          - delete_key(resource.attributes, "k8s.container.name")
          - delete_key(resource.attributes, "k8s.container.restart_count")
          - delete_key(resource.attributes, "k8s.deployment.name")
          - delete_key(resource.attributes, "k8s.node.name")
          - delete_key(resource.attributes, "k8s.node.uid")
          - delete_key(resource.attributes, "k8s.pod.uid")
      transform/k8s_up:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - set(attributes["kubernetes.namespace"], resource.attributes["k8s.namespace.name"])
          - set(attributes["kubernetes.pod"], resource.attributes["k8s.pod.name"])
          - set(attributes["kubernetes.pod_started"], resource.attributes["k8s.pod.start_time"])
          - set(attributes["kubernetes.cluster"], resource.attributes["k8s.cluster.name"])
          - set(attributes["kubernetes.uid"], resource.attributes["k8s.cluster.uid"])
          - set(attributes["container.name"], resource.attributes["k8s.container.name"])
          - set(attributes["container.restart_count"], resource.attributes["k8s.container.restart_count"])
          - set(attributes["kubernetes.deployment"], resource.attributes["k8s.deployment.name"])
          - set(attributes["kubernetes.host"], resource.attributes["k8s.node.name"])
      transform/k8s_upcluster:
        error_mode: ignore
        log_statements:
        - context: log
          statements:
          - replace_pattern(attributes["kubernetes.cluster"],"^mc_(.*)$","$$1")
    receivers:
      filelog:
        include:
        - /var/log/pods/apps*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - cache:
            size: 128
          parse_from: attributes["log.file.path"]
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          type: regex_parser
        - from: attributes.container_name
          to: resource["k8s.container.name"]
          type: move
        - from: attributes.namespace
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          to: resource["k8s.pod.name"]
          type: move
        - from: attributes.restart_count
          to: resource["k8s.container.restart_count"]
          type: move
        - from: attributes.uid
          to: resource["k8s.pod.uid"]
          type: move
        - field: attributes.cloud
          type: add
          value:
            provider: azure
        - field: attributes.cloud
          type: add
          value:
            region: East US
        retry_on_failure:
          enabled: true
        start_at: beginning
    service:
      extensions:
      - basicauth
      pipelines:
        logs:
          exporters:
          - elasticsearch
          - debug
          processors:
          - k8sattributes
          - transform/k8s_up
          - transform/k8s_upcluster
          - transform/k8s_del
          - transform/geoip_before
          - geoip
          - transform/geoip_after
          - transform/geoip_del
          receivers:
          - filelog

Log output

2024-09-06T08:34:02.523Z	error	consumerretry/logs.go:87	Max elapsed time expired. Dropping data.	{"kind": "receiver", "name": "filelog", "data_type": "logs", "error": "no geo IP metadata found", "dropped_items": 100}
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/consumerretry.(*logsConsumer).ConsumeLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal@v0.108.0/consumerretry/logs.go:87
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.108.0/adapter/receiver.go:126

Additional context

It works well if the IP is found in the maxmindDB. It's almost like the fact that the IP is not found creates a delay so long that the receiver times out and drops logs

@grioda01 grioda01 added bug Something isn't working needs triage New item requiring triage labels Sep 6, 2024
Copy link
Contributor

github-actions bot commented Sep 6, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@grioda01
Copy link
Author

grioda01 commented Sep 6, 2024

Note that I enabled the retry_on_failure parameter of the filelog receiver, that is when I got the error above. Prior to that, I was getting another message, though still same type of geo ip error with dropped logs

@rogercoll
Copy link
Contributor

@grioda01 Thanks for raising this, I think we could tackle this issue by allowing the user to specify the error level: #35069

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Sep 9, 2024
@grioda01
Copy link
Author

Thank you @rogercoll
For now I found a workaround by creating a transform processor, where I detect the presence of an internal ip as the value of the source.address attributes field and replace it with one of the companies real external IP

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Sep 18, 2024

This is a good workaround @grioda01, thanks for describing it.

@rogercoll Before we reach conclusion in the discussion on error mode configuration, how about we change the current behavior to not error out when the IP is not found in the geo database? I believe the default behavior should be to not error out anyway. What do you think?

@grioda01
Copy link
Author

I completely agree. That is the industry behavior, after all. THat's how other log shipper behave. Besides, it keeps things simpler.
But be ware, I am not sure if the plugin is giving the error. It looks more like a filelog time out error. Almost as if the geoip plugin is taking too long when encountering that type of IP, and that affects the whole pipeline iteration

@rogercoll
Copy link
Contributor

how about we change the current behavior to not error out when the IP is not found in the geo database?

Sounds good to me, I agree that this should not be an error and output a log (for debugging). I can work on a PR to skip this error for the MaxMind provider.

@rogercoll
Copy link
Contributor

@andrzej-stencel @grioda01 I opened this PR to not prevent throwing an error when an IP is not found. Please, let me know what are your thoughts: #35278
Thanks!

andrzej-stencel pushed a commit that referenced this issue Oct 1, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.--> 
If a provider does not found any associated metadata to the given IP,
the processor will continue the processing instead of returning the
error. Nonetheless, the error will be logged when debug telemetry level
is enabled.

**Link to tracking Issue:** <Issue number if applicable>
#35047

**Testing:** <Describe what testing was performed and which tests were
added.> Add a testdata case for IP `1.2.3.5` which is not available in
any of the providers (maxmind neither mocked provider)

**Documentation:** <Describe the documentation added.>
@rogercoll
Copy link
Contributor

@grioda01 The fix was merged #35278 I think we can close this issue

jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…etry#35278)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.--> 
If a provider does not found any associated metadata to the given IP,
the processor will continue the processing instead of returning the
error. Nonetheless, the error will be logged when debug telemetry level
is enabled.

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#35047

**Testing:** <Describe what testing was performed and which tests were
added.> Add a testdata case for IP `1.2.3.5` which is not available in
any of the providers (maxmind neither mocked provider)

**Documentation:** <Describe the documentation added.>
@grioda01
Copy link
Author

I tested it last week in live prod environment. It worked and it saved the day!!! thanks so much, it was published just in time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/geoip
Projects
None yet
Development

No branches or pull requests

4 participants