-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hwmon: Provide annotation metric to link chip sysfs paths to human-readable chip types #359
hwmon: Provide annotation metric to link chip sysfs paths to human-readable chip types #359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Can we avoid the word "type"?
…On 29 Nov 2016 11:11, "Ben Kochie" ***@***.***> wrote:
***@***.**** approved this pull request.
Nice
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#359 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdqBSRuWAeBb5xmru7qnWWiZkBnxVks5rDAhbgaJpZM4K-1su>
.
|
Do these map in any way to kernel module names? |
I am not invested in any naming here, I just found that "type" seems more fitting than "name" (it is not the name of the chip, but some kind of type is fitting). I’m not sure how those relate to kernel modules. It might be that those are kernel module names, at least I can find "coretemp" and "nouveau" loaded on my machine and "amdgpu" on another. I’m happy to adjust the commits when a consensus is reached. I don’t have a strong opinion in any direction. The sysfs documentation states:
|
|
The chip label generation has been changed in prometheus#334 to prefer the unique device path (e.g. the location on the PCI bus) due to prometheus#333. Here, a new annotation metric ``node_hwmon_chip_names`` is introduced which allows to link the unique chip sysfs path to a human-readable chip name which may not be unique among chip sysfs paths (for example, dual-slot systems have multiple chipType="coretemp" sensors). This allows to mitigate the downsides of the solution to prometheus#333 (namely that the device path may not be stable across kernels and reboots) for cases where it does not matter that multiple devices may have the same human-readable name (e.g. aggregation or where at most one device with a common chip name is present). For cases where no human-readable name can be derived, the annotation metric is not emitted.
11020d1
to
3efaa1a
Compare
@brian-brazil @SuperQ Done! It’s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small nitpicks, beside that it looks good.
@@ -143,6 +144,26 @@ func (c *hwMonCollector) updateHwmon(ch chan<- prometheus.Metric, dir string) (e | |||
} | |||
} | |||
|
|||
hwmonChipName, err := c.hwmonHumanReadableChipName(dir) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove newline, it's uncommon to have a newline between returning and error and checking it.
@@ -351,6 +372,27 @@ func (c *hwMonCollector) hwmonName(dir string) (string, error) { | |||
return "", errors.New("Could not derive a monitoring name for " + dir) | |||
} | |||
|
|||
func (c *hwMonCollector) hwmonHumanReadableChipName(dir string) (string, error) { | |||
// this is similar to the methods in hwmonName, but with different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this above the function and start with "hwmonHumanReadableChipName is similar to ...."?
I'll just fix the comments etc myself. Thanks for your contribution! |
The chip label generation has been changed in #334 to prefer the unique device path (e.g. the location on the PCI bus) due to #333.
After discussion in #355, we learnt that the way forward is to provide an annotation metric (instead of a new label on the existing metrics). The annotation metric allows to link the unique chip sysfs path to a human-readable chip type which may not be unique among chip sysfs paths (for example, dual-slot systems have multiple chipType="coretemp" sensors).
This allows to mitigate the downsides of the solution to #333 (namely that the device path may not be stable across kernels and reboots) for cases where it does not matter that multiple devices may have the same human-readable type (e.g. aggregation or where at most one device of a type is present).
For cases where no human-readable type can be derived, the annotation metric is not emitted.
(Note: The below examples have been modified to match the most recent commits in this pull request.)
The annotation metric can be used in queries like this:
Find the voltages of a nouveau-driven GPU:
Find the temperatures of a GPU using the amdgpu driver:
Find all CPU temperatures: