-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513
Comments
I just wanted to add a bit more of a background on why this is important to us to @ResDiaryLewis's description. If we have the following SQL Azure databases configured:
We can label them with the region and database name, giving us something like the following:
What this means is that we can then create a single generic Grafana dashboard, and we can add variables to it to allow us to filter based on the region or database name. For example, to add a drop down to choose the region you could use the following query in Grafana:
And to add a drop down to choose the database name you could use the following query:
Then to create a graph showing the DTU % for the selected database, you can use the following query:
It also means that we could aggregate across multiple resources (not that it makes sense for the DTU % metric in the example). If we were able to set aside time for this, would you be interested in a contribution? |
This is certainly an interesting scenario, thanks for sharing! So you are actually asking for 2 things here:
Is that a good summary? Would it be ok if you could do this? metrics:
- name: active_message_count
description: "Amount of active messages in our queue"
resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: orders
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Total
- name: active_message_count
description: "Amount of active messages in our queue"
resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: invoices
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Total
- name: active_message_count
description: "Amount of active messages in our queue"
resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: shipments
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Total Or are you more looking for the following? (excluding additional tag requirement for now) metrics:
- name: active_message_count
description: "Amount of active messages in our queue"
resources:
- resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: orders
- resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: invoices
- resourceType: ServiceBusQueue
namespace: promitor-messaging
queueName: shipments
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Total
Certainly, I'm happy to receive contributions! Let's spec this out and see if you have the bandwidth, does that sound reasonable? |
Closed by accident, sorry |
We think the second example makes sense as we're aggregating multiple resources under the same metrics:
- name: promitor_db_dtu_consumption_percent_average
description: "DTU percentage"
resources:
- resourceGroupName: uk-group
resourceType: Generic
resourceUri: Microsoft.Sql/servers/uk-server/databases/uk-database
- resourceGroupName: au-group
resourceType: Generic
resourceUri: Microsoft.Sql/servers/au-server/databases/au-database
azureMetricConfiguration:
metricName: dtu_consumption_percent
aggregation:
type: Average Which would produce these metrics:
In other scenarios where a Provider like Is there anything else you think that we should write specs for? Also, if one of use were to work on this, do you have any pointers on where we could start? |
That is certainly a good idea and I'll add this to v1.0.0 timeline since it's a breaking change. It's a fairly "big" refactoring of the current scraping but our docs on adding new scrapers should walk you through the areas that will have to change. From the top of my mind I'm thinking of:
Because of this, I can prioritize it given the impact unless you're up for it - I'm always welcoming PRs if you have the time! Here is an example of some contributed scrapers: (and how picky I am, sorry!): Feel free to have a look and let me know if you're willing to contribute it or not, just to know if I need to work on this or not. Thanks in advance! /fyi @brusMX @jsturtevant |
In case something is not clear, feel free to contact me at kerkhove.tom@gmail.com. Tracking docs improvements here as well: Feel free to post flaws there as well. |
That's a fair ask @marcinbudny, but I would rather say that's a different request as this does not relate to the restructuring but rather just how the ServiceBusQueue scraper works. This was moved to #529 and will be done in v1.0.0. |
@tomkerkhove I can't be completely certain of the timescale for when we could look at this, but what I can say is that we've been given the ok to contribute when we can fit it in. Realistically, we definitely won't be able to start this within the next couple of weeks, and I think we'd be talking at least a month from now in reality. If you're happy to just go ahead and make the changes before we have the time then obviously we're not going to stand in the way and would be really happy with that, but otherwise I'll make sure to provide an update here before we start doing any work on this to make sure we're not wasting effort. In the meantime we're going to run a PoC using Prometheus relabelling for a few weeks so that we can check we're happy with the metrics we're getting from Promitor. Does that work for you? |
The "splitting" seems to be a generic mechanism, not related to Service Bus itself. |
That's great to hear @adam-resdiary, thanks for the support! Since I'm trying to close 1.0.0 asap I'll probably already start working on this but feel free to circle back here before you start to check if I already started or not. Feel free to let me know how the test goes and feel free to open more issues if you're unsatisfied! |
@marcinbudny Yes and no - Splitting is a generic mechanism but does not work nicely with our current Service Bus Queue scraper, hence why there is a separate issue for this. We are adding general support for multi-dimensional metrics which enables you to split on a given sub-dimension and track that in #81. I'll extend it a bit so that it's more clear and make sure you'll be able to get labels for the dimensions. Is that fair to you? |
@tomkerkhove Sure. Thank you for the explanation. |
I'll start working on this in the next coming days @adam-resdiary @ResDiaryLewis . Wouldn't it make more sense to have the resource type on the root since it should always be the same? metrics:
- name: active_shipments_message_count
description: "Active message count for shipments"
resourceType: ServiceBusQueue
resources:
- namespace: promitor-messaging-we
queueName: shipments
- namespace: promitor-messaging-us
queueName: shipments
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Average Certainly given the metric name should be the same for all of the resources anyway. Or are there scenarios where you want to mix SB queues with storage queues for example? I think not? No promise on this sample above though will see how things go. |
👍 @tomkerkhove, that's a great point - it should always be the same. Obviously with the |
I think I'll move it out for now and can still make it more flexible later on if people really want to (although I think we shouldn't) |
As things keep on getting delayed I'm introducing #646 which allows us to version metrics declaration. If this feature does not make it for v1.0 we can still easily provide this in v1.x without breaking changes. |
@tomkerkhove I've started to take a look at this over the past couple of days. I didn't want to mention it before now until I had an idea of whether I thought I'd be able to make the change or not. So far what I've done is:
At this point, in theory I haven't altered the functionality at all, but I've put things in place to support the next stage. Before I go any further, I just want to check whether it's worth me continuing, and if so I think it would be good for you to take a look at the work in progress. The changes so far are pretty huge even though I've been trying to minimise what I'm changing where possible. |
Thanks for jumping in and helping on this @adamconnelly!
You have read my mind, I was about to split the configuration model from our internal model as well so I'm happy to see you went that route as well! How are you handling the mapping between configuration & internal model? Are you using something like AutoMapper? Can you elaborate on the following please?
For the rest, it sounds very promising and given you mention that it just works as before I'd say an early PR for just changing the internals would be a good start. That way we can keep the changes minimal and get PRs in more easily. Note: I'm on holiday so responses might be slow |
@tomkerkhove I've put up a PR with the first set of changes (i.e. adding the new v1 config objects to separate the internal model from the serialization objects). In case you take a look before I've fixed this yet, I'll sort the TODO and the fact the CLI hasn't been signed. As for the mapping between the models - I decided against using AutoMapper. Given how few places we actually need to do the mapping in, I figured that adding the complication of AutoMapper probably wasn't worth the effort. I've used it extensively in other projects and it's great, but I think it would give us more pain than it's worth in this case. Instead I opted to rename all the objects to XyzBuilder, and add a As for the MetricDefinition refactoring - once you're happy with the first PR I can push up the next set of changes that includes that, but basically I've altered the class to look something like this: public class MetricDefinition
{
public AzureMetricConfiguration AzureMetricConfiguration { get; set; }
// Contains name, description and labels
public PrometheusMetricDefinition PrometheusMetricDefinition { get; set; }
public Scraping Scraping { get; set; };
public ResourceType ResourceType { get; }
// Contains a collection of resources to be scraped.
public List<AzureResourceDefinition> Resources { get; set; }
} So previously |
I've pushed up a branch with the second set of changes (supporting multiple resources in the model, but without the v2 config). I'm going to hold off creating a PR until you get a chance to take a look at the first one, because it's dependent on the first set of changes. If you want to take a look anyway, you can see the changes here: adamconnelly/promitor@create-v1-configs...adamconnelly:multi-resource-metrics Also - I totally appreciate you're on your holiday, so no pressure. |
Thanks for the update, I'll have a look at the first PR now! In terms of the metric configuration, I'd love to keep it as simple and use something like the following: metrics:
- name: active_shipments_message_count
description: "Active message count for shipments"
resourceType: ServiceBusQueue
resources:
- namespace: promitor-messaging-we
queueName: shipments
- namespace: promitor-messaging-us
queueName: shipments
azureMetricConfiguration:
metricName: ActiveMessages
aggregation:
type: Average It looks like it's similar to what you have, without (assuming |
@tomkerkhove that's exactly what it's like. The
As an aside - I think there's maybe a bug with the existing code (or maybe it was deliberate). It looks to me like although all of the resource types have a |
Awesome, thanks!
Woops, maybe best to create a dedicated issue for this? |
Sure - I've made a note for myself about the ResourceGroupName problem and I'll create an issue after finishing this issue. |
@tomkerkhove just to keep you in the loop, I'll aim to put up a PR for the last part later this week, depending on how my week goes. |
Thanks @adamconnelly ! Take your time, no rush! |
Open work:
|
- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric. - I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there. - Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading. - I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent. Part of tomkerkhove#513.
- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric. - I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there. - Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading. - I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent. Part of #513.
Thanks for contributing this feature @adamconnelly! I'll have a look at the Helm chart but we should be good to go. Will release the RC once prom-client-net/prom-client#65 is shipped and #546 is finalized! |
Sounds good. Excited to try it out now! |
Everything looks good and is implemented - Thank you @adamconnelly ! Currently we only have a bug with helm chart which is tracked in #695 |
Those who want to use it already, feel free to use |
We want to be able to use Prometheus labels on our metrics. We are currently using Azure Metrics Exporter. If we scrape two databases for their DTU consumption using a metric called
dtu_consumption_percent_percent_average
, it will label them withresource_group
andsub_resource_name
. This allows us to aggregate these metrics with PromQL queries.As far as I know, Promitor currently does not allow you to re-use the same metric name or use labels. This leaves us having to add suffixes to metrics, which can't be aggregated. e.g.:
Specification
The text was updated successfully, but these errors were encountered: