Add container metric fields (from ECS) #282

ChrsMark · 2023-08-24T09:10:32Z

This PR adds container related metrics fields as part of #72.

Continues #87 on top of the project restructuring.

Based on the comparison between ECS and Otel collector the result is the following (see #282 (comment)).

Summary of added fields

container.cpu.time (ok for the Collector implementation: Add container metric fields (from ECS) #282 (comment))
container.memory.usage (more accurate and aligned with system metrics)
container.disk.io which has as dimension the already existent disk.io.direction (Otel collector only provides the Linux specific metric and hence a generic one capable to cover Windows as well would be better)
container.network.io which has as dimension the already existent network.io.direction (better alignment with disko.io and use of dimension).

All of those also refer to container.id as attribute as well.

model/metrics/container.yaml

jsuereth

Is there a prototype/implementation of these metrics somewhere (e.g. in collector contrib?)

model/metrics/container.yaml

ChrsMark · 2023-10-04T11:07:30Z

Is there a prototype/implementation of these metrics somewhere (e.g. in collector contrib?)

The Collector implementation can be found at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/documentation.md but the naming is more or less different. The respective Beats implementation that honors the ECS fields can be found as an example at https://github.com/elastic/beats/blob/89bcc33a9a90cbacceaa61fceec0e8d23073574c/metricbeat/module/docker/cpu/data.go#L65.

I will come with a comparison matrix soon to help us understand what are our alternatives/options here.

ChrsMark · 2023-10-11T13:46:42Z

So I have checked what ECS provides in combination with the Otel Collector's implementation. Here is summary:

ECS	Otel collector	Notes
`container.cpu.usage`	`container.cpu.utilization`	In otelcollector `container.cpu.usage` is a namespace
`container.memory.usage`	`container.memory.percent`	`container.memory.usage` is a namespace
`container.disk.read/write.bytes`	`container.blockio.io_service_bytes_recursive`	Otel collector only retrieve the Linux specific metric.
`container.network.ingress.bytes`	`container.network.io.usage.rx_bytes`
`container.network.egress.bytes`	`container.network.io.usage.tx_bytes`

So based on the above I propose the following naming:

container.cpu.utilization (does not break the Collector)
container.memory.utilization (more accurate and aligned with system metrics and cpu.utilazation)
container.disk.io.bytes and container.disk.io.direction (Otel collector only provides the Linux specific metric and hence a generic one capable to cover Windows as well would be better)
container.network.io.bytes and container.network.io.direction (better alignment with disko.io and use of dimension).

ChrsMark · 2023-10-13T09:29:52Z

Pushed latest changes based on #282 (comment).
@jsuereth @trask feel free to have another look.

ChrsMark · 2023-11-20T09:28:42Z

Will need to use the network and disk attributes from #530

docs/system/container-metrics.md

ChrsMark · 2024-01-02T11:58:04Z

@rmfitzpatrick @jamesmoessis since you have been maintaining the dockerstatsreceiver could you provide your feedback here?

@open-telemetry/semconv-container-approvers please have a look as well.

dmitryax · 2024-01-09T22:01:46Z

Why don't we start with usage metrics for memory and CPU instead of the utilization? The utilization seems less important because it can always be derived from usage divided by limit. container.cpu.utilization is also confusing without knowing what the limit is. There is a PR to migrate from in the kubelet receiver.

In otelcollector container.cpu.usage is a namespace

I see it's implemented in the docker receiver for metrics like container.cpu.usage.kernelmode, but I don't think it's the right approach. I think the metric should be container.cpu.usage with kernelmode as an attribute value.

Also, usage for CPU is an average aggregation over time, so it may be more important to define container.cpu.time in the first place as a cumulative sum. It'll be consistent with the system metric system.cpu.time

ChrsMark · 2024-01-15T08:28:50Z

Thank's @dmitryax Ilike the idea of being aligned with kubeletstats as well.
If I understand this correctly your proposal is to add the following:

container.cpu.time and container.cpu.state
container.memory.usage

Also do you propose to skip the container.cpu/memory.utilization metrics completely for now, or we can add them additionally? I would lean towards adding them anyways even as optional?

dmitryax · 2024-01-17T03:08:17Z

Right. But I'm not sure about container.cpu.state. The docker stats receiver reports the following metrics:

  container.cpu.usage.system
  container.cpu.usage.total
  container.cpu.usage.kernelmode
  container.cpu.usage.usermode
  container.cpu.usage.percpu

All these metrics correspond 1:1 to the info that the docker provides. We cannot get more granular data to provide metrics with state and logical number attributes as we do for system.cpu.time.

The kubelet, on the other hand, doesn't provide any data other than plain container time/usage values. So we don't have any attributes for the container.cpu.time and container.cou.usage metrics emitted by the kubelet receiver.

I'm curious how https://www.elastic.co/guide/en/ecs/current/ecs-container.html#field-container-cpu-usage is used in the existing implementations. Is it used to represent the docker stats?

Maybe we can just define container.cpu.time without any attributes defined in the semantic convention?

Another option is to change the docker receiver to emit container.cpu.time but without an option to have both state and logical number attributes: only one of them or none.

ChrsMark · 2024-01-18T14:45:04Z

I'm curious how https://www.elastic.co/guide/en/ecs/current/ecs-container.html#field-container-cpu-usage is used in the existing implementations. Is it used to represent the docker stats?

Yes @dmitryax, this metric is used to represent the total CPU usage calculated from docker stats.
In Metricbeat, which implements these metrics, we have the following:

docker.cpu.kernel.*
docker.cpu.system.*
docker.cpu.user.*
docker.cpu.total.*
docker.cpu.core.*

The implementation can be found at https://github.com/elastic/beats/blob/13ec0b30f5502720ff5079853e6f437478348ec7/metricbeat/module/docker/cpu/data.go#L32-L61

A sample document can be found at https://github.com/elastic/beats/blob/main/metricbeat/module/docker/cpu/_meta/data.json.

From these the docker.cpu.total.norm.pct is also exposed as the container.cpu.usage ECS field as we can see at https://github.com/elastic/beats/blob/13ec0b30f5502720ff5079853e6f437478348ec7/metricbeat/module/docker/cpu/data.go#L65. The reason is that only this one was considered as interesting enough to be promoted as an ECS field/metric so far.

So I think that all implementations can emit this base metric called container.cpu.usage with container.cpu.state to be set to total as the default. With that, kubeletstats can use it without an issue as it would always emit container.cpu.state: total. Would that make sense?

jamesmoessis · 2024-01-24T23:55:57Z

I agree with what @dmitryax is saying. We can change the dockerstats receiver to emit container.cpu.time, in fact, this is how I would've wanted it to be implemented in the first place so it aligns with the system semconv, but didn't change it since it was there from older existing implementations. Since it's still in alpha, it wouldn't be a bad idea to change it sooner rather than later.

In terms of the state attribute, would it make sense to mark it as conditionally required if available? Then implementations can choose whether to add the attribute.

I think for CPU logical_number we can either :

Have this in a separate metric to container.cpu.time. Example container.cpu.time_percpu. This is similar to how the docker stats receiver works currently. I'm aware this deviates somewhat to the system semconv.
Have it as an optional attribute. In my experience I disable this because the cost of cardinality at scale is not worth the seldom useful granularity it provides.

Also do you propose to skip the container.cpu/memory.utilization metrics completely for now, or we can add them additionally

I propose to skip this for now. Reduce the scope of this PR and it will be closed out faster. This PR should have bare minimum then we can add more in later PRs.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark · 2024-03-21T11:37:24Z

@joaopgrassi @lmolkova thank's for reviewing! I have tried to address or answer your comments. Could have another look please?

lmolkova

Left some minor comments, but LGTM.

Heads up on #820 - we want to merge it (hopefully very soon) and once it's in stability will be required for all metrics, attributes, and enum members.

ChrsMark · 2024-03-27T09:22:25Z

I have filed #840 to keep track of #282 (comment). Beyond this, I believe that's ready to go now @open-telemetry/specs-semconv-maintainers :)

ChrsMark requested review from a team August 24, 2023 09:10

github-actions bot assigned arminru Aug 24, 2023

ChrsMark mentioned this pull request Aug 24, 2023

Add container metrics fields from ECS #87

Closed

ChrsMark force-pushed the container_metrics branch 2 times, most recently from b7d690a to 451f3b2 Compare August 24, 2023 12:31

jsuereth reviewed Aug 30, 2023

View reviewed changes

model/metrics/container.yaml Outdated Show resolved Hide resolved

jsuereth reviewed Aug 30, 2023

View reviewed changes

trask reviewed Aug 31, 2023

View reviewed changes

model/metrics/container.yaml Outdated Show resolved Hide resolved

ChrsMark force-pushed the container_metrics branch from 451f3b2 to 113808a Compare October 13, 2023 09:12

ChrsMark requested a review from a team October 13, 2023 09:12

ChrsMark force-pushed the container_metrics branch from 19c8c0e to 2b73f9d Compare October 13, 2023 09:28

ChrsMark force-pushed the container_metrics branch from 2b73f9d to 775a7d1 Compare October 13, 2023 09:30

ChrsMark mentioned this pull request Oct 19, 2023

Request to create semconv-{container,k8s}-approvers #427

Closed

ChrsMark mentioned this pull request Nov 16, 2023

system: shared IO direction attributes #530

Merged

3 tasks

ChrsMark force-pushed the container_metrics branch from 775a7d1 to f44061c Compare November 23, 2023 10:27

ChrsMark requested a review from a team November 23, 2023 10:40

KarstenSchnitter reviewed Nov 23, 2023

View reviewed changes

docs/system/container-metrics.md Outdated Show resolved Hide resolved

KarstenSchnitter reviewed Nov 23, 2023

View reviewed changes

docs/system/container-metrics.md Show resolved Hide resolved

ChrsMark self-assigned this Jan 9, 2024

ChrsMark added 13 commits March 21, 2024 12:25

Add container metric fields

f87c864

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Update naming and add attributes

bd12130

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Use disk/network common attributes for io direction

e901bb0

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

regenerate tables

517d952

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Conclude on cpu.time and memory.usage

3ddb816

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Add brief note for the states

9accca5

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

move changelog entry to .chloggen

945ddc8

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

add clarification on system cpu state

1383648

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

address review comments

e4ea3f4

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

change cpu.time unit to seconds

8732fa7

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

fix note for CPU cores consumption

6167521

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

remove ref to container.id

bf9b78d

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

flatten registry

9f1a8f0

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark force-pushed the container_metrics branch from 068af39 to 9f1a8f0 Compare March 21, 2024 11:25

address review comments

b5aa89a

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark force-pushed the container_metrics branch from 2875b5f to b5aa89a Compare March 21, 2024 11:36

lmolkova approved these changes Mar 26, 2024

View reviewed changes

joaopgrassi approved these changes Mar 26, 2024

View reviewed changes

Merge branch 'main' into container_metrics

675ff4d

ChrsMark mentioned this pull request Mar 27, 2024

Consider merging *.cpu.state metric attributes #840

Closed

joaopgrassi merged commit 0941ebb into open-telemetry:main Mar 27, 2024
11 checks passed

This was referenced Mar 27, 2024

Consider adopting more container fields from ECS #72

Closed

[receiver/kubeletstat] Review cpu.utilization naming open-telemetry/opentelemetry-collector-contrib#27885

Open

ChrsMark mentioned this pull request Apr 3, 2024

[chore] Move system metric attributes to the registry #867

Merged

3 tasks

This was referenced Jun 5, 2024

Define semantic conventions for k8s metrics #1032

Open

Add container.cpu.usage metric #1128

Merged

This was referenced Oct 17, 2024

[k8s] Define semantic conventions for k8s cpu metrics #1489

Open

[k8s] Define semantic conventions for k8s memory metrics #1490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add container metric fields (from ECS) #282

Add container metric fields (from ECS) #282

ChrsMark commented Aug 24, 2023 •

edited

Loading

jsuereth left a comment

ChrsMark commented Oct 4, 2023

ChrsMark commented Oct 11, 2023

ChrsMark commented Oct 13, 2023

ChrsMark commented Nov 20, 2023

ChrsMark commented Jan 2, 2024

dmitryax commented Jan 9, 2024

ChrsMark commented Jan 15, 2024

dmitryax commented Jan 17, 2024

ChrsMark commented Jan 18, 2024

jamesmoessis commented Jan 24, 2024

ChrsMark commented Mar 21, 2024

lmolkova left a comment

ChrsMark commented Mar 27, 2024

Add container metric fields (from ECS) #282

Add container metric fields (from ECS) #282

Conversation

ChrsMark commented Aug 24, 2023 • edited Loading

Summary of added fields

jsuereth left a comment

Choose a reason for hiding this comment

ChrsMark commented Oct 4, 2023

ChrsMark commented Oct 11, 2023

ChrsMark commented Oct 13, 2023

ChrsMark commented Nov 20, 2023

ChrsMark commented Jan 2, 2024

dmitryax commented Jan 9, 2024

ChrsMark commented Jan 15, 2024

dmitryax commented Jan 17, 2024

ChrsMark commented Jan 18, 2024

jamesmoessis commented Jan 24, 2024

ChrsMark commented Mar 21, 2024

lmolkova left a comment

Choose a reason for hiding this comment

ChrsMark commented Mar 27, 2024

ChrsMark commented Aug 24, 2023 •

edited

Loading