-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate metrics names to adhere to our guide lines #150
Comments
One thing to keep in mind is that we're not going to be able to apply this to all metrics that we're exporting here, as it's not practical to determine which are/aren't counters and what the units are (e.g. netstat). There's a balance with all exporters between maintainability and exposition perfection. Anything procedurally generated usually ends up on the maintainability end of things. We should be able to fix up most of the core metrics though. |
it is quite unfortunate that difference exist even in a single collector between platforms, such as the longer the differences exist, the more painful it will be to switch... |
@mischief That's one of the ones we can't fix unfortunately. I'd imagine memory is also one of the areas where the semantics will be subtly different. |
@brian-brazil what does 'cant fix' mean here? not willing to change because of breaking compat? |
@mischief What we have to work with is |
What if in this case we at least converted CamelCase to under_scores (and downcase everything in general)? There are already some other underscores in that file, but the result should still be ok. Or too risky? |
Generally it's a little risky to go to underscores for a dump like that, as the original name is no longer apparent to be searched for online. In this case we've also a few that'd end up messy like |
The original name can be put in the Help text. I'd vote to export metrics according to our naming standards. |
Hmm, even with the name in the help text, |
The challenge is that requires hand choosing metric names in the case of exporters, so it's not practical when more than a handful of metrics are involved (and that's presuming they don't change much over time). This problem is not unique to the node exporter, it's a tradeoff we have to make with all exporters. |
Hum thought I commented after setting to accepted: The remaining question is just when and how to change it. It's a lot of work and IMO requires some test infra to make sure all collectors still work. |
There are over 450 metrics exposed by default, and we don't even know the names of all of them in advance. That's more than a little maintenance. |
@brian-brazil We don't necessarily need to change all but we should do it at least for the most common ones (cpu/memory/disk/net/). |
Part of the consolidation should be having consistent metric names for the same things, see #414 |
We can only sanely fix things where we fully decide the name in the first place. CPU is one of those, memory is not. |
Would it be useful to investigate how some of the older host monitoring solutions (Zabbix, etc) do this? They may have figured out a practical approach which also lends itself well to cross platform capable dashboards. |
Based on what I've seen, I imagine they're hand picking 10-20 metrics. We're at a scale where that doesn't work. Cross-platform is very unlikely to work for anything beyond the simplest of metrics. Even something like load average varies significantly across platforms. |
No worries at all. 😄 |
I'd like to point out that the decision to rename node exporter metrics only to better follow naming conventions, might make sense from a developer point of view, but from a user point of view this is going to cause a catastrophic experience. I wish not breaking stuff for users would be considered more important than cosmetic considerations. How are people supposed to deploy these changes? Deploy new dashboards and new node exporters on all hosts at the same time? |
@schweikert Dashboards like Grafana can be configured to make multiple queries. You can set them up to query both the old and new names, which will provide a reasonably seamless Same goes for recording rules, you can setup rules to deal with both names, and even record the new name into the old rule name. For example:
|
You could also use |
@SuperQ, @brian-brazil: thanks for the tips, that's good to know. I still worry about how users are going to take this. The reputation cost there is going to be quite significant. |
@schweikert It's definitely painful, but it's a tradeoff between having broken metric names forever or having this temporary pain once (and as long as it's still a 0.x version, might not be possible later, depending on metrics stability guarantees in a 1.x version). From a usage and teaching perspective the |
Maybe it would help the transition to have an option to publish the old metric names as well, in addition to the new ones. I am thinking of our company, where we publish a node exporter RPM package for consumption by various teams. If we would publish a new package with the new metric names in the yum repo, it would immediately start breaking things for our users. If we had an option to expose both metric names, we could communicate that there is a transition period, and remove that option only a few months later in our package. |
Doubling the load on user's Prometheus servers would likely take many of them out. There's always metric_relabel_configs if you want a smoother transition. |
The |
Signed-off-by: prombot <prometheus-team@googlegroups.com>
There are currently many metrics which don't follow our naming guidelines, e.g. they're missing
_total
suffixes or units like_bytes
.This will be a breaking change, but for the better. As node_exporter is one of the most popular exporters, it should also lead by example.
@juliusv @brian-brazil
The text was updated successfully, but these errors were encountered: