-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize process metric reporting #4232
Comments
I've completed an initial diff of the two metric sources. Here are my notes, for those interested. I will continue this task soon by proposing specific changes: Process metrics collected by
|
Name | Instrument | Units | Description | Labels |
---|---|---|---|---|
process.cpu.time |
monotonic sum | s | Total CPU seconds broken down by different states. | state SHOULD be one of: system , user , wait |
process.memory.physical_usage |
non-monotonic sum | By | The amount of physical memory in use. | |
process.memory.virtual_usage |
non-monotonic sum | By | Virtual memory size. | |
process.disk.io |
monotonic sum | By | Disk bytes transferred. | direction SHOULD be one of: read , write |
Process metrics collected for internal telemetry
Name | Instrument | Units | Description | Labels |
---|---|---|---|---|
process/cpu_seconds |
monotonic sum | s | Total CPU user and system time in seconds | |
process/memory/rss |
non-monotonic sum | By | Total physical memory (resident set size) | |
process/uptime |
monotonic sum | s | Uptime of the process. | |
process/runtime/heap_alloc_bytes |
non-monotonic sum | By | Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc.') | |
process/runtime/total_alloc_bytes |
monotonic sum | By | Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc') | |
process/runtime/total_sys_memory_bytes |
non-monotonic sum | By | Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys') |
Notes:
process.cpu.time
andprocess/cpu_seconds
are nearly equivalent, but:process.cpu.time
is broken down by state, and depends on OS- linux:
system
,user
,wait
- windows:
system
,user
- other: metric is not collected
- linux:
process/cpu_seconds
represents atotal
total
!=system + user + wait
because there are additional states
process.memory.physical_usage
is exactly equivalent toprocess/memory/rss
process.memory.virtual_usage
has no equivalent in internal telemetry, but would be trivial to add if desiredprocess.disk.io
has no equivalent in internal telemetry, but should be easy enough to add if desiredprocess/uptime
has no equivalent in hostmetricsreceiver. It is tracked from process start usingtime.Now()
, so should be easy to add if desired- might be slightly inaccurate if initial timestamp is captured within hostmetricsreceiver (probably <1s in most cases, so close enough for most use cases)
process/runtime/*
metrics have no equivalent in hostmetricsreceiver. They are pulled fromruntime.MemStats
, so should be easy to add if desired
[Proposed] Unified process metrics
Notes on proposed changes
Detailed proposed changes
Open QuestionsIn particular, I'd appreciate scrutiny on the heap metric names. I believe they are organized in a logical hierarchy, but they are very verbose, and perhaps unnecessarily so:
|
I've updated the above proposal w/ a change to |
@djaglowski thanks, I'll take a look and comment. |
@dmitryax please review this to see how it can affect Signalfx translations, assuming we adopt these changes for hostmetrics receiver. Do we have a good way to handle this such that it is not a breaking change for our customers? |
I assume we mimic system.cpu.time, right? I don't see that system.cpu.time allows the |
Please use the instrument terminology defined in the spec. I think |
(Meta: it may be worth putting the proposal in a Google doc to make commenting and discussion easier). |
Does this violate the Prometheus rule of thumb about aggregations?
Neither sum() nor avg() seem to be meaningful aggregations in this case. I am also not sure the dimension name |
I think
I think that our two alternate implementations for If labels are always required, then I think we have two options:
I don't feel strongly that my initial proposal is the only option, but it seemed to me the best one given the alternatives I was able to identify. I'd definitely appreciate other opinions on this. |
I think you're right. I'll update the proposal to keep them seperate. This also has the benefit of preserving all metrics in |
I understand the proposal and it seems reasonable, but I am not an expert in metrics, so would like to some other opinions. This needs to be posted as Otel spec proposal anyway, so we should see more feedback there. Let's go with what you suggest, but explicitly ask feedback on this issue. |
Ok, will do. Thanks for advising. |
Resolves #1817 Prerequisite for [(collector) #4232](open-telemetry/opentelemetry-collector#4232)
Resolves #1817 Prerequisite for [(collector) #4232](open-telemetry/opentelemetry-collector#4232)
Resolves #1817 Prerequisite for [(collector) #4232](open-telemetry/opentelemetry-collector#4232)
Resolves #1817 Prerequisite for [(collector) #4232](open-telemetry/opentelemetry-collector#4232)
Resolves open-telemetry#1817 Prerequisite for [(collector) #4232](open-telemetry/opentelemetry-collector#4232)
The Collector has a hostmetrics receiver which emits process metrics. The Collector also emits its own process metrics.
These 2 sources of process metrics today don't follow the same conventions for metric names. We need to come up with a standard / semantic conventions for emitting process metrics, make it part of the specification (see this issue) and them make sure both hostmetrics receiver and Collector's own metrics are emitted according to the defined conventions.
The text was updated successfully, but these errors were encountered: