Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/k8sattributes] agent passthrough, gateway k8s.pod.ip configuration broken post v0.55.0 release #13765

Closed
evantorrie opened this issue Aug 31, 2022 · 7 comments · Fixed by #13993
Labels
bug Something isn't working priority:p2 Medium processor/k8sattributes k8s Attributes processor

Comments

@evantorrie
Copy link
Contributor

Describe the bug

In an agent/gateway k8s deployment, the k8sattributes processor no longer correctly adds k8s metadata to the Resource associated with traces emitted by a pod elsewhere in the cluster. I suspect this may have broken due to PR #8465

Steps to reproduce

Create a daemonset agent/deployment gateway collector setup in a k8s cluster with the configuration files shown below. Generate traces from a pod that sends to the local daemonset agent pod. Observe the debug logs on the gateway collector.

What did you expect to see?
Logs for each ResourceSpans indicating each Resource with a preexisting k8s.pod.ip attribute had the Resource augmented with k8s.namespace.name, k8s.pod.uid and k8s.pod.name.

What did you see instead?
Logs showing the incoming traces have a Resource with the correct pod IP in the k8s.pod.ip attribute, but none of k8s.namespace.name, k8s.pod.uid nor k8s.pod.name.

What version did you use?
Version: v0.56.0

What config did you use?

On the agent side:

receivers:
  otlp:
    protocols:
      grpc: 0.0.0.0:4317

processors:
  k8sattributes:
    passthrough: true
  batch:

exporters:
  otlp:
    endpoint: otel-gateway:4317

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes, batch ]
      exporters: [ otlp ]

and on the gateway side

receivers:
  otlp:
    protocols:
      grpc: 0.0.0.0:4317

processors:
  batch:
  k8sattributes:
    pod_association:
      - sources:
         - from: resource_attribute
           name: k8s.pod.ip
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
       
exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes, batch ]
      exporters: [ logging ]

Environment
OS: RHEL8 Linux
Compiler(if manually compiled): go1.17

Additional context

From the extension debug logs, the incoming Resource is already augmented by the agent pod with k8s.pod.ip=<podIP>.

In kubernetesprocessor.processResource(), the debug message "evaluating pod identifier [ {Source: { From: "resource_attribute", Name: "k8s.pod.ip" }, Value: "<podIp>" }, ..]" is printed, but it appears to return false from WatchClient.GetPod(<podIdentifier>). This seems to indicate that there is no entry in c.Pods[] for that specific PodIdentifier.

The primary place in the code where pods are added to the c.Pods[] map is in WatchClient.addOrUpdatePod().

Prior to the aforementioned PR, the keys for the Pods map were Strings containing either the IP address or the pod UID, and there was no distinction between whether it came from the "connection" or from a "resource_attribute".

After the PR, the key(s) returned fromWatchClient.getIdentifiersFromAssoc() are inserted into the Pods map, with each key pointing to the pod info. To correctly match the key which is printed in evaluating pod identifier..., getIdentifiersFromAssoc() should return the key {Source: { From: "resource_attribute", Name: "k8s.pod.ip" }, Value: "<podip>" } when given the input association specification, namely:

    pod_association:
      - sources:
         - from: resource_attribute
           name: k8s.pod.ip

Based on code inspection, it appears this does not happen. I suspect changing the switch statement in getIdentifiersFromAssoc() from:

case conventions.AttributeHostName:
		attr = pod.Address

to

case conventions.AttributeHostName, "k8s.pod.ip":
		attr = pod.Address

would probably fix this problem. But I have only ascertained this via code inspection, not compiling and running the code!

Possibly related: #13119

@evantorrie evantorrie added the bug Something isn't working label Aug 31, 2022
@evan-bradley evan-bradley added priority:p2 Medium processor/k8sattributes k8s Attributes processor labels Sep 1, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Pinging code owners: @owais @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@sumo-drosiek
Copy link
Member

Hi @evantorrie

I made a fix based on your suggestion, but need to reproduce issue in order to confirm the solution

@evantorrie
Copy link
Contributor Author

Do we have any existing integration tests which set up a collector in an agent/gateway deployment in a k8s cluster?

@sumo-drosiek
Copy link
Member

I do not know about such
cc: @dmitryax

@evantorrie
Copy link
Contributor Author

evantorrie commented Sep 8, 2022

@sumo-drosiek I created a repo with instructions (and appropriate files) to reliably reproduce the problem of Resources failing to get augmented with k8s metadata. Reproduction instructions demonstrate it working with opentelemetry-collector-contrib:0.54.0, but failing as soon as it's upgraded to opentelemetry-collector-contrib:0.55.0.

Here: https://github.com/evantorrie/k8sattributes-repro

@sumo-drosiek
Copy link
Member

sumo-drosiek commented Sep 9, 2022

@evantorrie It helped a lot. I confirm that after the fix, the output looks the following way:

Resource attributes:
     -> service.name: STRING(frontend)
     -> telemetry.sdk.language: STRING(nodejs)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.5.0)
     -> k8s.namespace.name: STRING(default)
     -> k8s.node.name: STRING(sumologic-kubernetes-collection)
     -> k8s.pod.name: STRING(my-otel-demo-frontend-6f48595b7c-vj8jc)
     -> process.pid: INT(17)
     -> process.executable.name: STRING(node)
     -> process.command: STRING(/app/server.js)
     -> process.command_line: STRING(/usr/local/bin/node /app/server.js)
     -> process.runtime.version: STRING(16.16.0)
     -> process.runtime.name: STRING(nodejs)
     -> process.runtime.description: STRING(Node.js)
     -> k8s.pod.ip: STRING(10.1.126.154)
     -> k8s.pod.start_time: STRING(2022-09-09 05:23:24 +0000 UTC)
     -> k8s.pod.uid: STRING(c2ffaf84-549e-479d-b931-50162b93b19f)
     -> k8s.deployment.name: STRING(my-otel-demo-frontend)

image is available as:

image:
  # tag: 0.54.0
  tag: test
  repository: sumodrosiek/otc

if you want to test it on your own

@evantorrie
Copy link
Contributor Author

if you want to test it on your own

Yes. Works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/k8sattributes k8s Attributes processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants