Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

Closed
2 tasks
ResDiaryLewis opened this issue Apr 19, 2019 · 36 comments
Assignees
Labels
feature All issues that are new features metric-labelling All issues related to metric labelling scraping All issues related to scraping specs-defined All issues where the specifications are defined and ready to be implemented
Milestone

Comments

@ResDiaryLewis
Copy link

We want to be able to use Prometheus labels on our metrics. We are currently using Azure Metrics Exporter. If we scrape two databases for their DTU consumption using a metric called dtu_consumption_percent_percent_average, it will label them with resource_group and sub_resource_name. This allows us to aggregate these metrics with PromQL queries.

As far as I know, Promitor currently does not allow you to re-use the same metric name or use labels. This leaves us having to add suffixes to metrics, which can't be aggregated. e.g.:

- name: promitor_db_dtu_percent_my_database
  description: "DTUs consumed"
  resourceGroupName: my_group
  resourceType: Generic
  resourceUri: Microsoft.Sql/servers/my_server/databases/my_database
  filter: ""
  azureMetricConfiguration:
    metricName: dtu_consumption_percent
    aggregation:
      type: Average

Specification

  • Allow us to define Prometheus labels in metric configurations
  • Allow duplicate metric names if the label values are different
@tomkerkhove tomkerkhove added the feature-request New feature requests label Apr 19, 2019
@adam-resdiary
Copy link
Contributor

I just wanted to add a bit more of a background on why this is important to us to @ResDiaryLewis's description.

If we have the following SQL Azure databases configured:

  • uk-database
  • usnc-database
  • euw-database

We can label them with the region and database name, giving us something like the following:

azure_sql_dtu_percent_average{region="uk", database_name="uk-database"} 59
azure_sql_dtu_percent_average{region="usnc", database_name="usnc-database"} 42
azure_sql_dtu_percent_average{region="euw", database_name="euw-database"} 12

What this means is that we can then create a single generic Grafana dashboard, and we can add variables to it to allow us to filter based on the region or database name.

For example, to add a drop down to choose the region you could use the following query in Grafana:

label_values(azure_sql_dtu_percent_average, region)

And to add a drop down to choose the database name you could use the following query:

label_values(azure_sql_dtu_percent_average{region=~"$region", database_name)

Then to create a graph showing the DTU % for the selected database, you can use the following query:

azure_sql_dtu_percent_average{region=~"$region", database_name=~"$databaseName"}

It also means that we could aggregate across multiple resources (not that it makes sense for the DTU % metric in the example).

If we were able to set aside time for this, would you be interested in a contribution?

@tomkerkhove
Copy link
Owner

This is certainly an interesting scenario, thanks for sharing!

So you are actually asking for 2 things here:

  1. Be able to scrape multiple instances with the same metric and tag it with a label shich has the instance name let's say
  2. Be able to specify additional metadata to a metrics which is added as additional labels to the metric

Is that a good summary?

Would it be ok if you could do this?

metrics:
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: orders
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: invoices
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: shipments
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total

Or are you more looking for the following? (excluding additional tag requirement for now)

metrics:
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resources:
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: orders
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: invoices
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: shipments
    azureMetricConfiguration:
        metricName: ActiveMessages
        aggregation:
            type: Total

If we were able to set aside time for this, would you be interested in a contribution?

Certainly, I'm happy to receive contributions!

Let's spec this out and see if you have the bandwidth, does that sound reasonable?

@tomkerkhove
Copy link
Owner

Closed by accident, sorry

@ResDiaryLewis
Copy link
Author

We think the second example makes sense as we're aggregating multiple resources under the same metricName and aggregation.type. Here's an example we mocked up for a Generic resource:

metrics:
- name: promitor_db_dtu_consumption_percent_average
  description: "DTU percentage"
  resources:
  - resourceGroupName: uk-group
    resourceType: Generic
    resourceUri: Microsoft.Sql/servers/uk-server/databases/uk-database
  - resourceGroupName: au-group
    resourceType: Generic
    resourceUri: Microsoft.Sql/servers/au-server/databases/au-database
  azureMetricConfiguration:
    metricName: dtu_consumption_percent
    aggregation:
      type: Average

Which would produce these metrics:

promitor_db_dtu_consumption_percent_average{resource_group="uk-group", resource_uri="Microsoft.Sql/servers/uk-server/databases/uk-database"} 29
promitor_db_dtu_consumption_percent_average{resource_group="au-group", resource_uri="Microsoft.Sql/servers/au-server/databases/au-database"} 29

In other scenarios where a Provider like ServiceBusQueue supports YAML fields like namespace and queueName, these could be added as the labels.

Is there anything else you think that we should write specs for? Also, if one of use were to work on this, do you have any pointers on where we could start?

@tomkerkhove
Copy link
Owner

tomkerkhove commented Apr 27, 2019

That is certainly a good idea and I'll add this to v1.0.0 timeline since it's a breaking change.

It's a fairly "big" refactoring of the current scraping but our docs on adding new scrapers should walk you through the areas that will have to change.

From the top of my mind I'm thinking of:

  • How we deserialize the config
  • How we consume Azure Monitor REST API and report metrics

Because of this, I can prioritize it given the impact unless you're up for it - I'm always welcoming PRs if you have the time!

Here is an example of some contributed scrapers: (and how picky I am, sorry!):

Feel free to have a look and let me know if you're willing to contribute it or not, just to know if I need to work on this or not.

Thanks in advance!

/fyi @brusMX @jsturtevant

@tomkerkhove tomkerkhove added feature All issues that are new features scraping All issues related to scraping specs-defined All issues where the specifications are defined and ready to be implemented and removed feature-request New feature requests labels Apr 27, 2019
@tomkerkhove tomkerkhove added this to the v1.0.0 milestone Apr 27, 2019
@tomkerkhove tomkerkhove changed the title Aggregating metrics with labels Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource Apr 27, 2019
@tomkerkhove
Copy link
Owner

tomkerkhove commented Apr 27, 2019

In case something is not clear, feel free to contact me at kerkhove.tom@gmail.com.

Tracking docs improvements here as well:

Feel free to post flaws there as well.

@marcinbudny
Copy link

From my perspective, it would be very useful to implement the "splitting" mechanism available in Azure Monitor. For example, I would like to scrape all the queues in a Service Bus namespace without configuring them individually. Then I would a have metric named something like servicebus_queue_size with label entity_name specifying name of the queue / topic. This is how many Prometheus exporters work (e.g. RabbitMQ exporter).

In Azure Monitor this looks like this:
image
As a result I get a chart with metric per entity name.

@tomkerkhove
Copy link
Owner

That's a fair ask @marcinbudny, but I would rather say that's a different request as this does not relate to the restructuring but rather just how the ServiceBusQueue scraper works.

This was moved to #529 and will be done in v1.0.0.

@adam-resdiary
Copy link
Contributor

@tomkerkhove I can't be completely certain of the timescale for when we could look at this, but what I can say is that we've been given the ok to contribute when we can fit it in. Realistically, we definitely won't be able to start this within the next couple of weeks, and I think we'd be talking at least a month from now in reality.

If you're happy to just go ahead and make the changes before we have the time then obviously we're not going to stand in the way and would be really happy with that, but otherwise I'll make sure to provide an update here before we start doing any work on this to make sure we're not wasting effort.

In the meantime we're going to run a PoC using Prometheus relabelling for a few weeks so that we can check we're happy with the metrics we're getting from Promitor.

Does that work for you?

@marcinbudny
Copy link

The "splitting" seems to be a generic mechanism, not related to Service Bus itself.

@tomkerkhove
Copy link
Owner

That's great to hear @adam-resdiary, thanks for the support! Since I'm trying to close 1.0.0 asap I'll probably already start working on this but feel free to circle back here before you start to check if I already started or not.

Feel free to let me know how the test goes and feel free to open more issues if you're unsatisfied!

@tomkerkhove
Copy link
Owner

The "splitting" seems to be a generic mechanism, not related to Service Bus itself.

@marcinbudny Yes and no - Splitting is a generic mechanism but does not work nicely with our current Service Bus Queue scraper, hence why there is a separate issue for this.

We are adding general support for multi-dimensional metrics which enables you to split on a given sub-dimension and track that in #81. I'll extend it a bit so that it's more clear and make sure you'll be able to get labels for the dimensions.

Is that fair to you?

@tomkerkhove tomkerkhove added the metric-labelling All issues related to metric labelling label Apr 29, 2019
@marcinbudny
Copy link

@tomkerkhove Sure. Thank you for the explanation.

@tomkerkhove
Copy link
Owner

I'll start working on this in the next coming days @adam-resdiary @ResDiaryLewis .

Wouldn't it make more sense to have the resource type on the root since it should always be the same?

metrics:
- name: active_shipments_message_count
  description: "Active message count for shipments"
  resourceType: ServiceBusQueue
  resources:
  - namespace: promitor-messaging-we
    queueName: shipments
  - namespace: promitor-messaging-us
    queueName: shipments
  azureMetricConfiguration:
    metricName: ActiveMessages
    aggregation:
      type: Average

Certainly given the metric name should be the same for all of the resources anyway.

Or are there scenarios where you want to mix SB queues with storage queues for example? I think not?

No promise on this sample above though will see how things go.

@ResDiaryLewis
Copy link
Author

ResDiaryLewis commented May 8, 2019

👍 @tomkerkhove, that's a great point - it should always be the same. Obviously with the Generic resource type this could get hard to enforce but I guess that's just inevitable.
As for mixing resources, we don't have a use case for that. Perhaps others might? 👀

@tomkerkhove
Copy link
Owner

I think I'll move it out for now and can still make it more flexible later on if people really want to (although I think we shouldn't)

@tomkerkhove
Copy link
Owner

As things keep on getting delayed I'm introducing #646 which allows us to version metrics declaration. If this feature does not make it for v1.0 we can still easily provide this in v1.x without breaking changes.

@adamconnelly
Copy link
Contributor

@tomkerkhove I've started to take a look at this over the past couple of days. I didn't want to mention it before now until I had an idea of whether I thought I'd be able to make the change or not.

So far what I've done is:

  • Separated the objects used for serialization from the internal model objects. This meant that I could alter the internal model to allow multiple resources per metric, while still being able to serialize/deserialize the v1 format.
  • Refactored the MetricDefinition object to make a clear separation between the Prometheus config, the Azure metric config, and the Azure resources to scrape. At the same time, I've altered it to support multiple resources under the one metric.

At this point, in theory I haven't altered the functionality at all, but I've put things in place to support the next stage.

Before I go any further, I just want to check whether it's worth me continuing, and if so I think it would be good for you to take a look at the work in progress. The changes so far are pretty huge even though I've been trying to minimise what I'm changing where possible.

@tomkerkhove
Copy link
Owner

Thanks for jumping in and helping on this @adamconnelly!

Separated the objects used for serialization from the internal model objects. This meant that I could alter the internal model to allow multiple resources per metric, while still being able to serialize/deserialize the v1 format.

You have read my mind, I was about to split the configuration model from our internal model as well so I'm happy to see you went that route as well!

How are you handling the mapping between configuration & internal model? Are you using something like AutoMapper?

Can you elaborate on the following please?

Refactored the MetricDefinition object to make a clear separation between the Prometheus config, the Azure metric config, and the Azure resources to scrape. At the same time, I've altered it to support multiple resources under the one metric.

For the rest, it sounds very promising and given you mention that it just works as before I'd say an early PR for just changing the internals would be a good start.

That way we can keep the changes minimal and get PRs in more easily.

Note: I'm on holiday so responses might be slow

@adamconnelly
Copy link
Contributor

@tomkerkhove I've put up a PR with the first set of changes (i.e. adding the new v1 config objects to separate the internal model from the serialization objects). In case you take a look before I've fixed this yet, I'll sort the TODO and the fact the CLI hasn't been signed.

#652

As for the mapping between the models - I decided against using AutoMapper. Given how few places we actually need to do the mapping in, I figured that adding the complication of AutoMapper probably wasn't worth the effort. I've used it extensively in other projects and it's great, but I think it would give us more pain than it's worth in this case.

Instead I opted to rename all the objects to XyzBuilder, and add a Build() method to them. Each object then just takes care of creating the correct model. Maybe you can think of a more appropriate name?

As for the MetricDefinition refactoring - once you're happy with the first PR I can push up the next set of changes that includes that, but basically I've altered the class to look something like this:

public class MetricDefinition
{
	public AzureMetricConfiguration AzureMetricConfiguration { get; set; }
	
	// Contains name, description and labels
	public PrometheusMetricDefinition PrometheusMetricDefinition { get; set; }
	public Scraping Scraping { get; set; };
	public ResourceType ResourceType { get; }
	
	// Contains a collection of resources to be scraped.
	public List<AzureResourceDefinition> Resources { get; set; }
}

So previously MetricDefinition was both the Metric and the Resource, but I've altered it so that it represents a Metric, which can contain one or more Resources.

@adamconnelly
Copy link
Contributor

I've pushed up a branch with the second set of changes (supporting multiple resources in the model, but without the v2 config). I'm going to hold off creating a PR until you get a chance to take a look at the first one, because it's dependent on the first set of changes. If you want to take a look anyway, you can see the changes here: adamconnelly/promitor@create-v1-configs...adamconnelly:multi-resource-metrics

Also - I totally appreciate you're on your holiday, so no pressure.

@tomkerkhove
Copy link
Owner

Thanks for the update, I'll have a look at the first PR now!

In terms of the metric configuration, I'd love to keep it as simple and use something like the following:

metrics:
- name: active_shipments_message_count
  description: "Active message count for shipments"
  resourceType: ServiceBusQueue
  resources:
  - namespace: promitor-messaging-we
    queueName: shipments
  - namespace: promitor-messaging-us
    queueName: shipments
  azureMetricConfiguration:
    metricName: ActiveMessages
    aggregation:
      type: Average

It looks like it's similar to what you have, without PrometheusMetricDefinition as this does feel a bit of overhead here, what do you think? Otherwise, it feels like a configuration model that is too complex to me.

(assuming AzureResourceDefinition is a generic type allowing for more specific ones)

@adamconnelly
Copy link
Contributor

@tomkerkhove that's exactly what it's like. The PrometheusMetricDefinition is only used in the internal model - the config looks exactly like your example.

AzureResourceDefinition is a base class for all the resource types - there's a separate class for each type of resource that adds the specific properties they need. I'm about 80% of the way through implementing the v2 config, so I'll be able to show you the full thing soon. I just wanted to try to keep it split into a few separate stages to keep the PR sizes as small as possible.

As an aside - I think there's maybe a bug with the existing code (or maybe it was deliberate). It looks to me like although all of the resource types have a ResourceGroupName property that they inherit, it's not actually being populated for most of them. Maybe you can double check if I'm right about that when you've got a chance and we can potentially sort that?

@tomkerkhove
Copy link
Owner

@tomkerkhove that's exactly what it's like. The PrometheusMetricDefinition is only used in the internal model - the config looks exactly like your example.

AzureResourceDefinition is a base class for all the resource types - there's a separate class for each type of resource that adds the specific properties they need. I'm about 80% of the way through implementing the v2 config, so I'll be able to show you the full thing soon. I just wanted to try to keep it split into a few separate stages to keep the PR sizes as small as possible.

Awesome, thanks!

As an aside - I think there's maybe a bug with the existing code (or maybe it was deliberate). It looks to me like although all of the resource types have a ResourceGroupName property that they inherit, it's not actually being populated for most of them. Maybe you can double check if I'm right about that when you've got a chance and we can potentially sort that?

Woops, maybe best to create a dedicated issue for this?

@adamconnelly
Copy link
Contributor

Sure - I've made a note for myself about the ResourceGroupName problem and I'll create an issue after finishing this issue.

@adamconnelly
Copy link
Contributor

@tomkerkhove just to keep you in the loop, I'll aim to put up a PR for the last part later this week, depending on how my week goes.

@tomkerkhove
Copy link
Owner

Thanks @adamconnelly ! Take your time, no rush!

@tomkerkhove
Copy link
Owner

tomkerkhove commented Aug 23, 2019

Open work:

adamconnelly added a commit to adamconnelly/promitor that referenced this issue Aug 25, 2019
- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric.
- I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there.
- Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading.
- I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent.

Part of tomkerkhove#513.
tomkerkhove pushed a commit that referenced this issue Aug 27, 2019
- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric.
- I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there.
- Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading.
- I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent.

Part of #513.
@tomkerkhove
Copy link
Owner

Thanks for contributing this feature @adamconnelly! I'll have a look at the Helm chart but we should be good to go.

Will release the RC once prom-client-net/prom-client#65 is shipped and #546 is finalized!

@adamconnelly
Copy link
Contributor

Sounds good. Excited to try it out now!

@tomkerkhove
Copy link
Owner

Everything looks good and is implemented - Thank you @adamconnelly !

Currently we only have a bug with helm chart which is tracked in #695

@tomkerkhove
Copy link
Owner

Those who want to use it already, feel free to use tomkerkhove/promitor-agent-scraper-ci:pr683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature All issues that are new features metric-labelling All issues related to metric labelling scraping All issues related to scraping specs-defined All issues where the specifications are defined and ready to be implemented
Projects
None yet
Development

No branches or pull requests

5 participants