Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.cpu.count metric #2392

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ release.
([#2290](https://github.com/open-telemetry/opentelemetry-specification/pull/2290))
- Add semantic conventions for [CloudEvents](https://cloudevents.io).
([#1978](https://github.com/open-telemetry/opentelemetry-specification/pull/1978))
- Add `process.cpu.count` metric.
([#2392](https://github.com/open-telemetry/opentelemetry-specification/pull/2392))

### Compatibility

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Below is a table of Process metric instruments.
| Name | Instrument | Units | Description | Labels |
|------|------------|-------|-------------|--------|
| `process.cpu.time` | Asynchronous Counter | s | Total CPU seconds broken down by different states. | `state`, if specified, SHOULD be one of: `system`, `user`, `wait`. A process SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels. |
| `process.cpu.count` | Asynchronous UpDownCounter | 1 | The number of logical CPUs available to the process. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that utilization is what users generally want to see. But utilization collected client-side is just a gauge and can't be aggregated over time, so I think it's nice to capture process.cpu.time and process.cpu.count where possible and display utilization based on those two metrics.

Copy link
Member

@bogdandrutu bogdandrutu Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the same debate for system.cpu.utilization and the result of the discussion is that system.cpu.utilization was simpler for some backends than calculating from two metrics.

But utilization collected client-side is just a gauge and can't be aggregated over time

Not sure this is true, since if you calculate "delta utilization" if process.cpu.count does not change, you can correctly merge them by doing sum:

Timestamp 0 -> cpu.time0/count
Timestamp 1 -> cpu.time1/count -> report (cpu.time 1 - cpu.time 0) / count / (Timestamp 1 - Timestamp 0)
Timestamp 2 -> cpu.time1/count -> report (cpu.time 2 - cpu.time 1) / count / (Timestamp 2 - Timestamp 1) -> percent of total cpu / second.

The idea was that if we report roughly every same interval (also we have start and current time) you can average them two reported value to get the utilization over (Timestamp 2 - Timestamp 0).

If the argument is wrong for the system.cpu.utilization we should change that as well, I want consistency :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I don't have any objection to that approach. From a JVM metrics perspective, there was some interest in capturing available cpu count separate from utilization because it gives clues about GC and common thread pool sizing, but we haven't mapped out GC or thread pool metrics yet, so will revisit that then if/when we have a more specific need. I'll send a new PR to propose adding process.cpu.utilization, with this definition of process.cpu.time divided by elapsed time divided by "available processor count"

| `process.memory.usage` | Asynchronous UpDownCounter | By | The amount of physical memory in use. | |
| `process.memory.virtual` | Asynchronous UpDownCounter | By | The amount of committed virtual memory. | |
| `process.disk.io` | Asynchronous Counter | By | Disk bytes transferred. | `direction` SHOULD be one of: `read`, `write` |
Expand Down