-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hwmon duplicate core temps #333
Comments
It looks like this is an issue for dual socket hardware, there are two |
I think we need to include the device path as a label. For example:
|
Ah, I suspected this might happen. @brian-brazil I removed the path-derived name which is guaranteed to be unique, should I re-add that to the labels? |
So that'd be coretemp.1 here rather than coretemp? |
Both solutions can lead to problems..... Using path fragments Pro: Guaranteed to be unique Using names as exported Pro: Human readable, stable across reboots (I actually tried to come up with a situation where it is non-unique, but failed to come up with this one shown here) My feeling right now would be to use the device path/name if a device link exists, then switching to the exported name and finally using the hwmon name. |
And that would use platform-coretemp.0 as the name. Which is always unique. |
Sounds like a plan. |
The chip label generation has been changed in prometheus#334 to prefer the unique device path (e.g. the location on the PCI bus) due to prometheus#333. Here, a new label, chipName, is introduced which, again, carries the human-readable sensor name (e.g. coretemp). It is used in addition to the existing labels. This allows to mitigate the downsides of the solution to prometheus#333 (namely that the device path may not be stable across kernels and reboots) for cases where it does not matter that multiple devices may have the same human-readable name (e.g. aggregation or where at most one device of a type is present).
The chip label generation has been changed in prometheus#334 to prefer the unique device path (e.g. the location on the PCI bus) due to prometheus#333. Here, a new annotation metric ``node_hwmon_chip_names`` is introduced which allows to link the unique chip sysfs path to a human-readable chip name which may not be unique among chip sysfs paths (for example, dual-slot systems have multiple chipType="coretemp" sensors). This allows to mitigate the downsides of the solution to prometheus#333 (namely that the device path may not be stable across kernels and reboots) for cases where it does not matter that multiple devices may have the same human-readable name (e.g. aggregation or where at most one device with a common chip name is present). For cases where no human-readable name can be derived, the annotation metric is not emitted.
On node_exporter 13.0-rc.1 on Linux (Debian 8), I'm getting the following warnings from prometheus:
A sample from /metrics:
CPU is a Xeon E5-2680 v3
Let me know if there is any other information I can provide.
The text was updated successfully, but these errors were encountered: