Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

ResDiaryLewis · 2019-04-19T14:25:47Z

We want to be able to use Prometheus labels on our metrics. We are currently using Azure Metrics Exporter. If we scrape two databases for their DTU consumption using a metric called dtu_consumption_percent_percent_average, it will label them with resource_group and sub_resource_name. This allows us to aggregate these metrics with PromQL queries.

As far as I know, Promitor currently does not allow you to re-use the same metric name or use labels. This leaves us having to add suffixes to metrics, which can't be aggregated. e.g.:

- name: promitor_db_dtu_percent_my_database
  description: "DTUs consumed"
  resourceGroupName: my_group
  resourceType: Generic
  resourceUri: Microsoft.Sql/servers/my_server/databases/my_database
  filter: ""
  azureMetricConfiguration:
    metricName: dtu_consumption_percent
    aggregation:
      type: Average

Specification

Allow us to define Prometheus labels in metric configurations
Allow duplicate metric names if the label values are different

The text was updated successfully, but these errors were encountered:

adam-resdiary · 2019-04-23T16:01:43Z

I just wanted to add a bit more of a background on why this is important to us to @ResDiaryLewis's description.

If we have the following SQL Azure databases configured:

uk-database
usnc-database
euw-database

We can label them with the region and database name, giving us something like the following:

azure_sql_dtu_percent_average{region="uk", database_name="uk-database"} 59
azure_sql_dtu_percent_average{region="usnc", database_name="usnc-database"} 42
azure_sql_dtu_percent_average{region="euw", database_name="euw-database"} 12

What this means is that we can then create a single generic Grafana dashboard, and we can add variables to it to allow us to filter based on the region or database name.

For example, to add a drop down to choose the region you could use the following query in Grafana:

label_values(azure_sql_dtu_percent_average, region)

And to add a drop down to choose the database name you could use the following query:

label_values(azure_sql_dtu_percent_average{region=~"$region", database_name)

Then to create a graph showing the DTU % for the selected database, you can use the following query:

azure_sql_dtu_percent_average{region=~"$region", database_name=~"$databaseName"}

It also means that we could aggregate across multiple resources (not that it makes sense for the DTU % metric in the example).

If we were able to set aside time for this, would you be interested in a contribution?

tomkerkhove · 2019-04-25T09:09:08Z

This is certainly an interesting scenario, thanks for sharing!

So you are actually asking for 2 things here:

Be able to scrape multiple instances with the same metric and tag it with a label shich has the instance name let's say
Be able to specify additional metadata to a metrics which is added as additional labels to the metric

Is that a good summary?

Would it be ok if you could do this?

metrics:
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: orders
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: invoices
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resourceType: ServiceBusQueue
    namespace: promitor-messaging
    queueName: shipments
    azureMetricConfiguration:
      metricName: ActiveMessages
      aggregation:
        type: Total

Or are you more looking for the following? (excluding additional tag requirement for now)

metrics:
  - name: active_message_count
    description: "Amount of active messages in our queue"
    resources:
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: orders
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: invoices
      - resourceType: ServiceBusQueue
        namespace: promitor-messaging
        queueName: shipments
    azureMetricConfiguration:
        metricName: ActiveMessages
        aggregation:
            type: Total

If we were able to set aside time for this, would you be interested in a contribution?

Certainly, I'm happy to receive contributions!

Let's spec this out and see if you have the bandwidth, does that sound reasonable?

tomkerkhove · 2019-04-25T09:09:18Z

Closed by accident, sorry

ResDiaryLewis · 2019-04-26T09:42:48Z

We think the second example makes sense as we're aggregating multiple resources under the same metricName and aggregation.type. Here's an example we mocked up for a Generic resource:

metrics:
- name: promitor_db_dtu_consumption_percent_average
  description: "DTU percentage"
  resources:
  - resourceGroupName: uk-group
    resourceType: Generic
    resourceUri: Microsoft.Sql/servers/uk-server/databases/uk-database
  - resourceGroupName: au-group
    resourceType: Generic
    resourceUri: Microsoft.Sql/servers/au-server/databases/au-database
  azureMetricConfiguration:
    metricName: dtu_consumption_percent
    aggregation:
      type: Average

Which would produce these metrics:

promitor_db_dtu_consumption_percent_average{resource_group="uk-group", resource_uri="Microsoft.Sql/servers/uk-server/databases/uk-database"} 29
promitor_db_dtu_consumption_percent_average{resource_group="au-group", resource_uri="Microsoft.Sql/servers/au-server/databases/au-database"} 29

In other scenarios where a Provider like ServiceBusQueue supports YAML fields like namespace and queueName, these could be added as the labels.

Is there anything else you think that we should write specs for? Also, if one of use were to work on this, do you have any pointers on where we could start?

tomkerkhove · 2019-04-27T07:36:27Z

That is certainly a good idea and I'll add this to v1.0.0 timeline since it's a breaking change.

It's a fairly "big" refactoring of the current scraping but our docs on adding new scrapers should walk you through the areas that will have to change.

From the top of my mind I'm thinking of:

How we deserialize the config
How we consume Azure Monitor REST API and report metrics

Because of this, I can prioritize it given the impact unless you're up for it - I'm always welcoming PRs if you have the time!

Here is an example of some contributed scrapers: (and how picky I am, sorry!):

Feel free to have a look and let me know if you're willing to contribute it or not, just to know if I need to work on this or not.

Thanks in advance!

/fyi @brusMX @jsturtevant

tomkerkhove · 2019-04-27T07:41:36Z

In case something is not clear, feel free to contact me at kerkhove.tom@gmail.com.

Tracking docs improvements here as well:

Feel free to post flaws there as well.

marcinbudny · 2019-04-29T08:23:18Z

From my perspective, it would be very useful to implement the "splitting" mechanism available in Azure Monitor. For example, I would like to scrape all the queues in a Service Bus namespace without configuring them individually. Then I would a have metric named something like servicebus_queue_size with label entity_name specifying name of the queue / topic. This is how many Prometheus exporters work (e.g. RabbitMQ exporter).

In Azure Monitor this looks like this:

As a result I get a chart with metric per entity name.

tomkerkhove · 2019-04-29T08:53:24Z

That's a fair ask @marcinbudny, but I would rather say that's a different request as this does not relate to the restructuring but rather just how the ServiceBusQueue scraper works.

This was moved to #529 and will be done in v1.0.0.

adam-resdiary · 2019-04-29T12:25:41Z

@tomkerkhove I can't be completely certain of the timescale for when we could look at this, but what I can say is that we've been given the ok to contribute when we can fit it in. Realistically, we definitely won't be able to start this within the next couple of weeks, and I think we'd be talking at least a month from now in reality.

If you're happy to just go ahead and make the changes before we have the time then obviously we're not going to stand in the way and would be really happy with that, but otherwise I'll make sure to provide an update here before we start doing any work on this to make sure we're not wasting effort.

In the meantime we're going to run a PoC using Prometheus relabelling for a few weeks so that we can check we're happy with the metrics we're getting from Promitor.

Does that work for you?

marcinbudny · 2019-04-29T13:35:56Z

The "splitting" seems to be a generic mechanism, not related to Service Bus itself.

tomkerkhove · 2019-04-29T14:12:39Z

That's great to hear @adam-resdiary, thanks for the support! Since I'm trying to close 1.0.0 asap I'll probably already start working on this but feel free to circle back here before you start to check if I already started or not.

Feel free to let me know how the test goes and feel free to open more issues if you're unsatisfied!

tomkerkhove · 2019-04-29T14:17:53Z

The "splitting" seems to be a generic mechanism, not related to Service Bus itself.

@marcinbudny Yes and no - Splitting is a generic mechanism but does not work nicely with our current Service Bus Queue scraper, hence why there is a separate issue for this.

We are adding general support for multi-dimensional metrics which enables you to split on a given sub-dimension and track that in #81. I'll extend it a bit so that it's more clear and make sure you'll be able to get labels for the dimensions.

Is that fair to you?

marcinbudny · 2019-04-29T20:41:02Z

@tomkerkhove Sure. Thank you for the explanation.

tomkerkhove · 2019-05-07T21:31:25Z

I'll start working on this in the next coming days @adam-resdiary @ResDiaryLewis .

Wouldn't it make more sense to have the resource type on the root since it should always be the same?

metrics:
- name: active_shipments_message_count
  description: "Active message count for shipments"
  resourceType: ServiceBusQueue
  resources:
  - namespace: promitor-messaging-we
    queueName: shipments
  - namespace: promitor-messaging-us
    queueName: shipments
  azureMetricConfiguration:
    metricName: ActiveMessages
    aggregation:
      type: Average

Certainly given the metric name should be the same for all of the resources anyway.

Or are there scenarios where you want to mix SB queues with storage queues for example? I think not?

No promise on this sample above though will see how things go.

ResDiaryLewis · 2019-05-08T07:48:59Z

👍 @tomkerkhove, that's a great point - it should always be the same. Obviously with the Generic resource type this could get hard to enforce but I guess that's just inevitable.
As for mixing resources, we don't have a use case for that. Perhaps others might? 👀

tomkerkhove · 2019-05-08T08:41:24Z

I think I'll move it out for now and can still make it more flexible later on if people really want to (although I think we shouldn't)

tomkerkhove · 2019-08-01T11:02:28Z

As things keep on getting delayed I'm introducing #646 which allows us to version metrics declaration. If this feature does not make it for v1.0 we can still easily provide this in v1.x without breaking changes.

adamconnelly · 2019-08-06T13:39:03Z

@tomkerkhove I've started to take a look at this over the past couple of days. I didn't want to mention it before now until I had an idea of whether I thought I'd be able to make the change or not.

So far what I've done is:

Separated the objects used for serialization from the internal model objects. This meant that I could alter the internal model to allow multiple resources per metric, while still being able to serialize/deserialize the v1 format.
Refactored the MetricDefinition object to make a clear separation between the Prometheus config, the Azure metric config, and the Azure resources to scrape. At the same time, I've altered it to support multiple resources under the one metric.

At this point, in theory I haven't altered the functionality at all, but I've put things in place to support the next stage.

Before I go any further, I just want to check whether it's worth me continuing, and if so I think it would be good for you to take a look at the work in progress. The changes so far are pretty huge even though I've been trying to minimise what I'm changing where possible.

tomkerkhove · 2019-08-06T16:05:50Z

Thanks for jumping in and helping on this @adamconnelly!

Separated the objects used for serialization from the internal model objects. This meant that I could alter the internal model to allow multiple resources per metric, while still being able to serialize/deserialize the v1 format.

You have read my mind, I was about to split the configuration model from our internal model as well so I'm happy to see you went that route as well!

How are you handling the mapping between configuration & internal model? Are you using something like AutoMapper?

Can you elaborate on the following please?

Refactored the MetricDefinition object to make a clear separation between the Prometheus config, the Azure metric config, and the Azure resources to scrape. At the same time, I've altered it to support multiple resources under the one metric.

For the rest, it sounds very promising and given you mention that it just works as before I'd say an early PR for just changing the internals would be a good start.

That way we can keep the changes minimal and get PRs in more easily.

Note: I'm on holiday so responses might be slow

adamconnelly · 2019-08-06T16:37:53Z

@tomkerkhove I've put up a PR with the first set of changes (i.e. adding the new v1 config objects to separate the internal model from the serialization objects). In case you take a look before I've fixed this yet, I'll sort the TODO and the fact the CLI hasn't been signed.

#652

As for the mapping between the models - I decided against using AutoMapper. Given how few places we actually need to do the mapping in, I figured that adding the complication of AutoMapper probably wasn't worth the effort. I've used it extensively in other projects and it's great, but I think it would give us more pain than it's worth in this case.

Instead I opted to rename all the objects to XyzBuilder, and add a Build() method to them. Each object then just takes care of creating the correct model. Maybe you can think of a more appropriate name?

As for the MetricDefinition refactoring - once you're happy with the first PR I can push up the next set of changes that includes that, but basically I've altered the class to look something like this:

public class MetricDefinition
{
	public AzureMetricConfiguration AzureMetricConfiguration { get; set; }
	
	// Contains name, description and labels
	public PrometheusMetricDefinition PrometheusMetricDefinition { get; set; }
	public Scraping Scraping { get; set; };
	public ResourceType ResourceType { get; }
	
	// Contains a collection of resources to be scraped.
	public List<AzureResourceDefinition> Resources { get; set; }
}

So previously MetricDefinition was both the Metric and the Resource, but I've altered it so that it represents a Metric, which can contain one or more Resources.

adamconnelly · 2019-08-07T13:00:57Z

I've pushed up a branch with the second set of changes (supporting multiple resources in the model, but without the v2 config). I'm going to hold off creating a PR until you get a chance to take a look at the first one, because it's dependent on the first set of changes. If you want to take a look anyway, you can see the changes here: adamconnelly/promitor@create-v1-configs...adamconnelly:multi-resource-metrics

Also - I totally appreciate you're on your holiday, so no pressure.

tomkerkhove · 2019-08-09T09:28:56Z

Thanks for the update, I'll have a look at the first PR now!

In terms of the metric configuration, I'd love to keep it as simple and use something like the following:

metrics:
- name: active_shipments_message_count
  description: "Active message count for shipments"
  resourceType: ServiceBusQueue
  resources:
  - namespace: promitor-messaging-we
    queueName: shipments
  - namespace: promitor-messaging-us
    queueName: shipments
  azureMetricConfiguration:
    metricName: ActiveMessages
    aggregation:
      type: Average

It looks like it's similar to what you have, without PrometheusMetricDefinition as this does feel a bit of overhead here, what do you think? Otherwise, it feels like a configuration model that is too complex to me.

(assuming AzureResourceDefinition is a generic type allowing for more specific ones)

adamconnelly · 2019-08-09T09:38:51Z

@tomkerkhove that's exactly what it's like. The PrometheusMetricDefinition is only used in the internal model - the config looks exactly like your example.

AzureResourceDefinition is a base class for all the resource types - there's a separate class for each type of resource that adds the specific properties they need. I'm about 80% of the way through implementing the v2 config, so I'll be able to show you the full thing soon. I just wanted to try to keep it split into a few separate stages to keep the PR sizes as small as possible.

As an aside - I think there's maybe a bug with the existing code (or maybe it was deliberate). It looks to me like although all of the resource types have a ResourceGroupName property that they inherit, it's not actually being populated for most of them. Maybe you can double check if I'm right about that when you've got a chance and we can potentially sort that?

tomkerkhove · 2019-08-09T09:55:37Z

@tomkerkhove that's exactly what it's like. The PrometheusMetricDefinition is only used in the internal model - the config looks exactly like your example.

AzureResourceDefinition is a base class for all the resource types - there's a separate class for each type of resource that adds the specific properties they need. I'm about 80% of the way through implementing the v2 config, so I'll be able to show you the full thing soon. I just wanted to try to keep it split into a few separate stages to keep the PR sizes as small as possible.

Awesome, thanks!

As an aside - I think there's maybe a bug with the existing code (or maybe it was deliberate). It looks to me like although all of the resource types have a ResourceGroupName property that they inherit, it's not actually being populated for most of them. Maybe you can double check if I'm right about that when you've got a chance and we can potentially sort that?

Woops, maybe best to create a dedicated issue for this?

adamconnelly · 2019-08-09T10:09:01Z

Sure - I've made a note for myself about the ResourceGroupName problem and I'll create an issue after finishing this issue.

adamconnelly · 2019-08-19T19:07:16Z

@tomkerkhove just to keep you in the loop, I'll aim to put up a PR for the last part later this week, depending on how my week goes.

tomkerkhove · 2019-08-20T06:03:42Z

Thanks @adamconnelly ! Take your time, no rush!

tomkerkhove · 2019-08-23T09:50:41Z

Open work:

Verify if Helm chart needs updates
Update contribution guide for new scrapers
Provide documentation for new metric approach in v1 docs
Change runtime to use new format (Allow multiple resources to be specified per metric #683 )

- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric. - I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there. - Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading. - I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent. Part of tomkerkhove#513.

- I've updated the documentation and the guide on adding a new scraper to match the changes made to support multiple resources per metric. - I also tweaked the wording about adding docs in the new scraper guide so it doesn't sound optional. I figured if someone goes to the bother of adding a scraper they can easily add some docs based on the examples that are there. - Tweaked some of the wording where I spotted a few typos / mistakes while proof-reading. - I've also tweaked the example metric names used. Some of them didn't make sense now that multiple resources are supported, and I figured it made sense to just do a pass and make them all consistent. Part of #513.

tomkerkhove · 2019-08-27T18:20:06Z

Thanks for contributing this feature @adamconnelly! I'll have a look at the Helm chart but we should be good to go.

Will release the RC once prom-client-net/prom-client#65 is shipped and #546 is finalized!

adamconnelly · 2019-08-27T18:39:29Z

Sounds good. Excited to try it out now!

tomkerkhove · 2019-09-02T16:12:34Z

Everything looks good and is implemented - Thank you @adamconnelly !

Currently we only have a bug with helm chart which is tracked in #695

tomkerkhove · 2019-09-02T16:13:12Z

Those who want to use it already, feel free to use tomkerkhove/promitor-agent-scraper-ci:pr683

tomkerkhove added the feature-request New feature requests label Apr 19, 2019

tomkerkhove closed this as completed Apr 25, 2019

tomkerkhove reopened this Apr 25, 2019

tomkerkhove added feature All issues that are new features scraping All issues related to scraping specs-defined All issues where the specifications are defined and ready to be implemented and removed feature-request New feature requests labels Apr 27, 2019

tomkerkhove added this to the v1.0.0 milestone Apr 27, 2019

tomkerkhove changed the title ~~Aggregating metrics with labels~~ Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource Apr 27, 2019

tomkerkhove mentioned this issue Apr 29, 2019

Service Bus Queues should be able to scrape all queues in namespace, not just one #529

Closed

3 tasks

tomkerkhove added the metric-labelling All issues related to metric labelling label Apr 29, 2019

tomkerkhove mentioned this issue May 14, 2019

[Question] Scaler Metric Extensibility kedacore/keda#194

Closed

tomkerkhove mentioned this issue Jun 27, 2019

[WIP] Pipeline integration testing using helm & minikube (SHOULD FAIL CI) #470

Closed

tomkerkhove mentioned this issue Aug 1, 2019

Provide versioning for metrics declaration spec #646

Closed

2 tasks

adamconnelly mentioned this issue Aug 6, 2019

Seperate internal metrics declaration model from the v1 (de)serialization model #652

Merged

adamconnelly mentioned this issue Aug 15, 2019

Update runtime model to support multiple resources per metric #666

Merged

tomkerkhove mentioned this issue Aug 19, 2019

Update "Deploying Promitor, Prometheus, and Grafana on an AKS Cluster" walkthrough for v1.0 #658

Closed

adamconnelly mentioned this issue Aug 23, 2019

Allow multiple resources to be specified per metric #683

Merged

adamconnelly mentioned this issue Aug 25, 2019

Update documentation for multi-resource changes #686

Merged

tomkerkhove self-assigned this Aug 29, 2019

tomkerkhove mentioned this issue Sep 2, 2019

Helm chart does not inject resources correctly #695

Closed

tomkerkhove closed this as completed Sep 2, 2019

adamconnelly mentioned this issue Oct 12, 2019

Provide more unit tests for configuration deserialization #283

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

ResDiaryLewis commented Apr 19, 2019

adam-resdiary commented Apr 23, 2019

tomkerkhove commented Apr 25, 2019

tomkerkhove commented Apr 25, 2019

ResDiaryLewis commented Apr 26, 2019

tomkerkhove commented Apr 27, 2019 •

edited

Loading

tomkerkhove commented Apr 27, 2019 •

edited

Loading

marcinbudny commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

adam-resdiary commented Apr 29, 2019

marcinbudny commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

marcinbudny commented Apr 29, 2019

tomkerkhove commented May 7, 2019

ResDiaryLewis commented May 8, 2019 •

edited

Loading

tomkerkhove commented May 8, 2019

tomkerkhove commented Aug 1, 2019

adamconnelly commented Aug 6, 2019

tomkerkhove commented Aug 6, 2019

adamconnelly commented Aug 6, 2019

adamconnelly commented Aug 7, 2019

tomkerkhove commented Aug 9, 2019

adamconnelly commented Aug 9, 2019

tomkerkhove commented Aug 9, 2019

adamconnelly commented Aug 9, 2019

adamconnelly commented Aug 19, 2019

tomkerkhove commented Aug 20, 2019

tomkerkhove commented Aug 23, 2019 •

edited

Loading

tomkerkhove commented Aug 27, 2019

adamconnelly commented Aug 27, 2019

tomkerkhove commented Sep 2, 2019

tomkerkhove commented Sep 2, 2019

Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

Provide capability to scrape multiple resources under on metrics, aggregated with labels per resource #513

Comments

ResDiaryLewis commented Apr 19, 2019

Specification

adam-resdiary commented Apr 23, 2019

tomkerkhove commented Apr 25, 2019

tomkerkhove commented Apr 25, 2019

ResDiaryLewis commented Apr 26, 2019

tomkerkhove commented Apr 27, 2019 • edited Loading

tomkerkhove commented Apr 27, 2019 • edited Loading

marcinbudny commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

adam-resdiary commented Apr 29, 2019

marcinbudny commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

tomkerkhove commented Apr 29, 2019

marcinbudny commented Apr 29, 2019

tomkerkhove commented May 7, 2019

ResDiaryLewis commented May 8, 2019 • edited Loading

tomkerkhove commented May 8, 2019

tomkerkhove commented Aug 1, 2019

adamconnelly commented Aug 6, 2019

tomkerkhove commented Aug 6, 2019

adamconnelly commented Aug 6, 2019

adamconnelly commented Aug 7, 2019

tomkerkhove commented Aug 9, 2019

adamconnelly commented Aug 9, 2019

tomkerkhove commented Aug 9, 2019

adamconnelly commented Aug 9, 2019

adamconnelly commented Aug 19, 2019

tomkerkhove commented Aug 20, 2019

tomkerkhove commented Aug 23, 2019 • edited Loading

tomkerkhove commented Aug 27, 2019

adamconnelly commented Aug 27, 2019

tomkerkhove commented Sep 2, 2019

tomkerkhove commented Sep 2, 2019

tomkerkhove commented Apr 27, 2019 •

edited

Loading

tomkerkhove commented Apr 27, 2019 •

edited

Loading

ResDiaryLewis commented May 8, 2019 •

edited

Loading

tomkerkhove commented Aug 23, 2019 •

edited

Loading