Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8sprocessor: add container.name field #1167

Conversation

pmalek-sumo
Copy link
Contributor

@pmalek-sumo pmalek-sumo commented Oct 1, 2020

Description: This PR adds option to tag data with container.name field.

Link to tracking Issue: #141

Testing: Unit tests added.

@pmalek-sumo
Copy link
Contributor Author

Not sure why the tests failed:

--- FAIL: TestHappyPathIntegration (5.00s)
    integration_test.go:110: 
        	Error Trace:	integration_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestHappyPathIntegration
FAIL
coverage: 98.3% of statements
FAIL	github.com/open-telemetry/opentelemetry-collector-contrib/extension/jmxmetricsextension/subprocess	5.030s
FAIL
make[2]: *** [../../Makefile.Common:56: do-unit-tests-with-cover] Error 1
make[2]: Leaving directory '/home/circleci/project/extension/jmxmetricsextension'
make[1]: *** [Makefile:63: for-all] Error 2
make[1]: Leaving directory '/home/circleci/project'
make: *** [Makefile:38: unit-tests-with-cover] Error 2

Copy link
Contributor

@owais owais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits.

processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
Cluster bool
StartTime bool
HostName bool
ContainerName bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ContainerNames make more sense as we add all names we see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this but taking into account @dmitryax's comment I'm not sure it will fit since the tag k8s.container.name is in singular.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it singular here for consistency, even if it may have multiple values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 done. Singular it is.

}

sort.Strings(names)
tags[conventions.AttributeContainerName] = strings.Join(names, ",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to semantic conventions
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/k8s.md#container, if we are using k8s spec as the source, it's not a real container name. container.name supposed to be provided by container engine. We should use k8s.container.name here instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but please comment whether it makes sense to have configuration option in plural and the tag in singular.

Comment on lines 244 to 248
if pod.Spec.Hostname == "" {
tags[conventions.AttributeHostName] = pod.Name
} else {
tags[conventions.AttributeHostName] = pod.Spec.Hostname
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostname :=  pod.Spec.Hostname
if hostname == "" {
  hostname = pod.Name
}
tags[conventions.AttributeHostName] = hostname

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reasons for this? (other than maybe accessing pod.Spec.Hostname just once)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmalek-sumo, I think such approach makes the intention more clear. Currently, you check pod.Spec.Hostname and assign pod.Name which is not obvious why, unless the line below is being read

Of course, that might be a small thing, but one where clarity could be improved easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 done

@codecov
Copy link

codecov bot commented Oct 1, 2020

Codecov Report

Merging #1167 into master will decrease coverage by 0.02%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1167      +/-   ##
==========================================
- Coverage   89.89%   89.86%   -0.03%     
==========================================
  Files         276      276              
  Lines       13621    13629       +8     
==========================================
+ Hits        12244    12248       +4     
- Misses       1011     1014       +3     
- Partials      366      367       +1     
Flag Coverage Δ
#integration 76.01% <ø> (ø)
#unit 89.01% <89.47%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
processor/k8sprocessor/options.go 98.24% <84.61%> (-1.76%) ⬇️
processor/k8sprocessor/kube/client.go 100.00% <100.00%> (ø)
receiver/k8sclusterreceiver/watcher.go 95.29% <0.00%> (-2.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f52fd7...0848767. Read the comment docs.

@pmalek-sumo
Copy link
Contributor Author

Added missing test

if hostname == "" {
hostname = pod.Name
}
tags[conventions.AttributeHostName] = hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's safe to reuse host.name attribute here. It can be easily confused with hostname of the node where the pod is running on. What is the use case? Do we have to reuse host.name attribute? if not, we might want to use another attribute with a more specific name like k8s.pod.hostname. what do you think?

cc @tigrannajaryan @jrcamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax+1 on k8s.pod.hostname(I think there are several attributes that could be easily added to semantic conventions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @dmitryax.

We'll need to add it to https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/conventions/opentelemetry.go.

@pmm-sumo mentioned he'd like to contribute that so let's wait for that and use it here.

Copy link
Contributor

@pmm-sumo pmm-sumo Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started with opening a PR to specs: open-telemetry/opentelemetry-specification#1072

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a discussion under the PR. I am wondering if maybe host.hostname is a better fit here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmm-sumo the host.hostname was recently removed from the conventions open-telemetry/opentelemetry-specification#838

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to recap, we had a discussion under the specification PR. Since hostname is ambiguous in this setting (containers host names vs the machine where it runs host name), it is unfortunately not a simple problem to solve. Additionally, it is present on all containerised environments and need to be addressed in a generic way.

To avoid taking focus out of the GA, the suggestion is to skip adding this feature for now and go back to it later, with a solid discussion involving several parties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, most of the time, the node hostname will reside in k8s.node.name and pod hostname in k8s.pod.name unless they are overwritten (which I believe is not a very common scenario).

@pmalek-sumo pmalek-sumo force-pushed the k8s-processor-hostname-and-containername-tags-pr branch from a65f246 to 9502d95 Compare October 7, 2020 08:00
@tigrannajaryan tigrannajaryan assigned owais and unassigned dmitryax Oct 14, 2020
@tigrannajaryan
Copy link
Member

@owais you know k8s processor best, please review.

@pmm-sumo
Copy link
Contributor

@pmalek-sumo could you remove the host.name capability from this PR (as we discussed in the thread)?

@pmalek-sumo
Copy link
Contributor Author

@pmalek-sumo could you remove the host.name capability from this PR (as we discussed in the thread)?

Sure thing, done.

@pmalek-sumo pmalek-sumo changed the title k8sprocessor: add host.name and container.name fields k8sprocessor: add container.name field Oct 14, 2020
}

sort.Strings(names)
tags[conventions.AttributeK8sContainer] = strings.Join(names, ",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps ", " (with a space after the comma) to make it more clear? Or using the array (AnyValue_ArrayValue) type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s.container.name should be about a single container. I assume we can't do that here because we correlate based on IP and the IP is shared across all the containers in the pod. If we make this an array (seems better) I think we should also consider adding a new semantic convention of k8s.pod.containers for this purpose.

Copy link
Contributor

@pmm-sumo pmm-sumo Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around semantic conventions to see if there's any other attribute troubled by a similar property (usually being a single value, but sometimes being a collection). The closest I get is process.command_line, but there are really no other examples I could find.

The same problem will apply to Kubernetes Service - there might be several of them pointing to a single pod.

I am not sure if this ambiguity could be solved in any way currently (@dashpole, perhaps you might have some suggestions?).

This leads to several specification-related questions, such as:

  • should there bet two attributes (one singular, one plural) depending if there's (typically) one or (sometimes) multiple values or, only one of them?
  • if there's only one attribute name, should it be plural or singular when there might be a collection of values for a key?
  • if there's only one attribute name, what should be the type of attribute value (always an array or perhaps a string when there's only single entry)?

For reference, there's a an ongoing discussion on pluralization guidelines for metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a metric about a pod (e.g. is correlated with the Pod's IP), then IMO it shouldn't have a container attribute. If it is about a container, then it should have a single container name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole so essentially when a single container cannot be accurately assigned then the information should be not included, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least in my opinion.

From monitoring systems i'm familiar with, I haven't seen a good way to handle lists of items in label (attribute) values.

The other thing to consider is that technically containers can be added to a running pod with the debug container feature, so this attribute wouldn't necessarily be constant over the lifetime of a pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing to consider is that technically containers can be added to a running pod with the debug container feature, so this attribute wouldn't necessarily be constant over the lifetime of a pod.

Even though this seems like a niche use case I agree that it might introduce incorrect data when we'd tag as above.

From monitoring systems i'm familiar with, I haven't seen a good way to handle lists of items in label (attribute) values.

As for this I was trying to approach tags rewrite to not be of map[string]string type but rather a map of string to AnyValue to have the tags already in the type that they will be serialized to.

This could make it easier as having a plural tag as key would already indicate there's an array in the value.

@jmacd
Copy link
Contributor

jmacd commented Oct 27, 2020

@pmalek-sumo @pmm-sumo can you clarify when we would expect to see an annotation with multiple container names? I ask because I agree with @dashpole here that we should either use a singlar value or not set this attribute, especially when considering metrics about pods and containers.

I'd lean in the direction of a singular attribute name. I tend to think of this as a single list or array containing multiple values, but I'm curious about the specifics in this case.

@pmm-sumo
Copy link
Contributor

@pmalek-sumo @pmm-sumo can you clarify when we would expect to see an annotation with multiple container names? I ask because I agree with @dashpole here that we should either use a singlar value or not set this attribute, especially when considering metrics about pods and containers.

I agree that would be the preferred approach.

One thing I am wondering if in such case the attribute should be set only when a single container exists (and not set when there are multiple containers) or never set

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2020

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 4, 2020
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 12, 2020
@pmalek-sumo
Copy link
Contributor Author

Since there has been no conclusion on this and the API as of now revolves around pods (not containers, hence we can't map 1:1 from which container the resource is coming from) it's hard to tag with container information consistently.

In case the discussion get traction again, I'm willing to continue work on this.

ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
* Add more semantic conventions for k8s

* Update CHANGELOG

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants