Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple resources to be specified per metric #683

Merged

Conversation

adamconnelly
Copy link
Contributor

@adamconnelly adamconnelly commented Aug 22, 2019

Work towards #513

Proposed Changes

This set of changes provides the ability to specify multiple resources against a single metric so that we can make use of prometheus labeling to aggregate metrics together.

For example, instead of the following configuration:

- name: azure_redis_cache-1_hits
  description: "The number of successful key lookups to cache-1."
  resourceType: RedisCache
  cacheName: cache-1
  azureMetricConfiguration:
    metricName: CacheHits
    aggregation:
      type: Total
      interval: 00:01:00
- name: azure_redis_cache-2_hits
  description: "The number of successful key lookups to cache-2."
  resourceType: RedisCache
  cacheName: cache-2
  azureMetricConfiguration:
    metricName: CacheHits
    aggregation:
      type: Total
      interval: 00:01:00

We can now write the following:

- name: azure_redis_cache_hits
  description: "The number of successful key lookups to a redis cache."
  resourceType: RedisCache
  azureMetricConfiguration:
    metricName: CacheHits
    aggregation:
      type: Total
      interval: 00:01:00
  resources:
  - cacheName: cache-1
  - cacheName: cache-2

This will return results similar to the following:

azure_redis_cache_hits{resource_group="...",subscription_id="...",resource_uri="subscriptions/.../Redis/cache-1",instance_name="cache-1"} 0
azure_redis_cache_hits{resource_group="...",subscription_id="...",resource_uri="subscriptions/.../Redis/cache-2",instance_name="cache-2"} 15

Since this is such a big change, I've left it split up into multiple commits. Most of the commits add a v2 version of the config format, and at the end I've removed the original v1 version and then renamed the v2 version back to v1.

I've taken a slightly different approach to the tests than the original code. Instead of performing a full serialization and deserialization I've tried to write it so that there are unit tests for each deserializer that don't rely on the serialization logic to work. My reasoning for this was that it makes the tests very simple, and it also means that if both the serialization and deserialization was to be broken at the same time (for example, if we renamed a yaml tag from cacheName to cName by accident), the tests would catch it because they have the actual yaml explicitly written in them instead of relying on the serialization being correct.

I've also included an overall integration test that builds a set of config objects, serializes them, deserializes them, and then checks that the runtime model is correct.

I haven't updated the docs yet - @tomkerkhove, just let me know if you'd like me to update this PR with the docs changes, or if you'd rather have it in a separate PR. I've updated the metric-config.yaml to match the new format.

Here's an example config file:

version: v1
azureMetadata:
  tenantId: 00000000-0000-0000-0000-000000000000
  subscriptionId: 00000000-0000-0000-0000-000000000000
  resourceGroupName: resource-group
metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    # Every minute
    schedule: "0 * * ? * *"
metrics:
  - name: azure_redis_cache_hits
    description: "The number of successful key lookups during the specified reporting interval. This maps to keyspace_hits from the Redis INFO command."
    resourceType: RedisCache
    labels:
      env: dev
      purpose: dev-stuff
    azureMetricConfiguration:
      metricName: CacheHits
      aggregation:
        type: Total
        interval: 00:01:00
    resources:
    - cacheName: api-redis
    - cacheName: diary-redis
    - cacheName: login-redis
  - name: azure_redis_cache_misses
    description: "The number of successful key lookups during the specified reporting interval. This maps to keyspace_hits from the Redis INFO command."
    resourceType: RedisCache
    azureMetricConfiguration:
      metricName: CacheMisses
      aggregation:
        type: Total
        interval: 00:01:00
    resources:
    - cacheName: api-redis
    - cacheName: diary-redis
    - cacheName: login-redis
  - name: azure_network_interface_bytes_received_average
    description: "The number of bytes received on average."
    resourceType: NetworkInterface
    azureMetricConfiguration:
      metricName: BytesReceivedRate
      aggregation:
        type: Average
        interval: 00:05:00
    resources:
    - networkInterfaceName: vm-1-nic
    - networkInterfaceName: vm-2-nic
    - networkInterfaceName: vm-3-nic
    - networkInterfaceName: vm-4-nic

Here's an example of the output that produces:

# HELP azure_network_interface_bytes_received_average The number of bytes received on average.
# TYPE azure_network_interface_bytes_received_average gauge
azure_network_interface_bytes_received_average{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Network/networkInterfaces/vm-3-nic",instance_name="vm-3-nic"} 36144.8
azure_network_interface_bytes_received_average{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Network/networkInterfaces/vm-4-nic",instance_name="vm-4-nic"} 36682
azure_network_interface_bytes_received_average{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Network/networkInterfaces/vm-1-nic",instance_name="vm-1-nic"} 2308584.2
azure_network_interface_bytes_received_average{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Network/networkInterfaces/vm-2-nic",instance_name="vm-2-nic"} 35092.2
# HELP azure_redis_cache_hits The number of successful key lookups during the specified reporting interval. This maps to keyspace_hits from the Redis INFO command.
# TYPE azure_redis_cache_hits gauge
azure_redis_cache_hits{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/login-redis",instance_name="login-redis",env="dev",purpose="dev-stuff"} 0
azure_redis_cache_hits{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/api-redis",instance_name="api-redis",env="dev",purpose="dev-stuff"} 15
azure_redis_cache_hits{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/diary-redis",instance_name="diary-redis",env="dev",purpose="dev-stuff"} 199
# HELP azure_redis_cache_misses The number of successful key lookups during the specified reporting interval. This maps to keyspace_hits from the Redis INFO command.
# TYPE azure_redis_cache_misses gauge
azure_redis_cache_misses{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/api-redis",instance_name="api-redis"} 3
azure_redis_cache_misses{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/diary-redis",instance_name="diary-redis"} 18
azure_redis_cache_misses{resource_group="resource-group",subscription_id="00000000-0000-0000-0000-000000000000",resource_uri="subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group/providers/Microsoft.Cache/Redis/login-redis",instance_name="login-redis"} 0
# HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager is going to throttle us.
# TYPE promitor_ratelimit_arm gauge
promitor_ratelimit_arm{tenant_id="00000000-0000-0000-0000-000000000000",subscription_id="00000000-0000-0000-0000-000000000000",app_id="00000000-0000-0000-0000-000000000000"} 11996

@promitor-bot
Copy link

Docker image for this PR was built and is available on Docker Hub.

You can pull it locally via the CLI:

docker pull tomkerkhove/promitor-agent-scraper-ci:pr683

Want to verify the new version? Run it locally:

docker run -d -p 8999:80 --name promitor-agent-scraper-pr683 \
                         --env PROMITOR_AUTH_APPID='<azure-ad-app-id>' \
                         --env-file C:/Promitor/az-mon-auth.creds \
                         --volume C:/Promitor/metrics-declaration.yaml:/config/metrics-declaration.yaml  \
                         --volume C:/Promitor/runtime-config.yaml:/config/runtime.yaml \
                         tomkerkhove/promitor-agent-scraper-ci:pr683

You can find a CI version of our Helm chart on hub.helm.sh

1 similar comment
@promitor-bot
Copy link

Docker image for this PR was built and is available on Docker Hub.

You can pull it locally via the CLI:

docker pull tomkerkhove/promitor-agent-scraper-ci:pr683

Want to verify the new version? Run it locally:

docker run -d -p 8999:80 --name promitor-agent-scraper-pr683 \
                         --env PROMITOR_AUTH_APPID='<azure-ad-app-id>' \
                         --env-file C:/Promitor/az-mon-auth.creds \
                         --volume C:/Promitor/metrics-declaration.yaml:/config/metrics-declaration.yaml  \
                         --volume C:/Promitor/runtime-config.yaml:/config/runtime.yaml \
                         tomkerkhove/promitor-agent-scraper-ci:pr683

You can find a CI version of our Helm chart on hub.helm.sh

Copy link
Owner

@tomkerkhove tomkerkhove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change @adamconnelly! I've added some comments but most of them are just suggestions or nitpicking to align with what we already have.

Happy to pick up the docs in a seperate PR!

@adamconnelly
Copy link
Contributor Author

@tomkerkhove that's me pushed a change to address your feedback. I've marked all the things I've fixed as resolved, but left a few items open because I'm not completely sure what to do so wanted to wait to hear your thoughts.

@promitor-bot
Copy link

Docker image for this PR was built and is available on Docker Hub.

You can pull it locally via the CLI:

docker pull tomkerkhove/promitor-agent-scraper-ci:pr683

Want to verify the new version? Run it locally:

docker run -d -p 8999:80 --name promitor-agent-scraper-pr683 \
                         --env PROMITOR_AUTH_APPID='<azure-ad-app-id>' \
                         --env-file C:/Promitor/az-mon-auth.creds \
                         --volume C:/Promitor/metrics-declaration.yaml:/config/metrics-declaration.yaml  \
                         --volume C:/Promitor/runtime-config.yaml:/config/runtime.yaml \
                         tomkerkhove/promitor-agent-scraper-ci:pr683

You can find a CI version of our Helm chart on hub.helm.sh

@tomkerkhove
Copy link
Owner

@adamconnelly Thanks, I'll do a new sweep early next week!

@promitor-bot
Copy link

Docker image for this PR was built and is available on Docker Hub.

You can pull it locally via the CLI:

docker pull tomkerkhove/promitor-agent-scraper-ci:pr683

Want to verify the new version? Run it locally:

docker run -d -p 8999:80 --name promitor-agent-scraper-pr683 \
                         --env PROMITOR_AUTH_APPID='<azure-ad-app-id>' \
                         --env-file C:/Promitor/az-mon-auth.creds \
                         --volume C:/Promitor/metrics-declaration.yaml:/config/metrics-declaration.yaml  \
                         --volume C:/Promitor/runtime-config.yaml:/config/runtime.yaml \
                         tomkerkhove/promitor-agent-scraper-ci:pr683

You can find a CI version of our Helm chart on hub.helm.sh

Copy link
Owner

@tomkerkhove tomkerkhove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's get rid of the TODO & code quality issue and we can ship it!

- Renaming the v1 serializer to `V1Serializer` to avoid having multiple classes all called `ConfigurationSerializer`.
- Added the initial framework for the `V2Serializer`.
- Updated all the v2 deserializers.
- Moved IDeserializer next to Deserializer, and adjusted the access levels from internal to public to allow unit testing.
- Updated the V2Serializer to inherit from Deserializer.
I've pulled a few utility methods up to the Deserializer object, and updated the v2 subclasses to use them.
Most of the tests were checking that a property had been set, or that it was null. As a result I've pulled those two assertions out to a helper class, and refactored the tests to use that helper.
- Added the deserializer.
- Added some extra assertions to `DeserializerTestHelpers` to help with the situation where a deserializer returns a base type, but we want to assert against another type.
- Implemented the container registry deserializer.
- Created a base class for azure resource deserializers to make sure ResourceGroupName is always deserialized.
- Created a base class for the resource deserializer tests.
I also added a `GetTimespan()` method to `Deserializer`, and updated the `AggregationDeserializer` to use that.
Removed `GetNullableEnum()` and altered `GetEnum()` to return a nullable enum. The rationale for this is that the configuration model should have null properties where nothing was supplied in the yaml.
- Added a test that creates a v2 model, serializes it, deserializes it using the V2 deserializer, and then verifies it was deserialized correctly.
- Implemented the AzureResourceDeserializerFactory.
- Fixed a small typo where I had put "metrics" instead of "resources" as a yaml tag name in the MetricDefinitionDeserializer.
- Defaulted the version in MetricsDeclarationV2 so you don't need to explicitly specify it when creating a v2 model.
The `AssertPropertyNull()` methods didn't actually need the property type argument because we can assert null against an object.
They both handle deserialization, so I've renamed them to `{Version}Deserializer`. I've also rename the `InterpretYamlStream()` method in the v1 deserializer to `Deserialize()` since that's what it does.
I've hooked the v2 format into the application so that it can actually be used.
I've converted the example config so that it uses the new multi-resource format.
- Pulled Deserializer methods to extension methods.
- Fixed some places where I'd left names as `v2`.
- Removed the duplicated code for creating the deserializer in test classes to fix code factor tests.
- Added missing doc-comments to resources.
- Renamed DeserializerTestHelpers to YamlAssert.
- Replaced mock loggers with NullLogger.Instance.
- Various other fixes.
- Removed TODO.
- Removed unnecessary using directives.
@adamconnelly
Copy link
Contributor Author

@tomkerkhove hopefully that should be it sorted.

@promitor-bot
Copy link

Docker image for this PR was built and is available on Docker Hub.

You can pull it locally via the CLI:

docker pull tomkerkhove/promitor-agent-scraper-ci:pr683

Want to verify the new version? Run it locally:

docker run -d -p 8999:80 --name promitor-agent-scraper-pr683 \
                         --env PROMITOR_AUTH_APPID='<azure-ad-app-id>' \
                         --env-file C:/Promitor/az-mon-auth.creds \
                         --volume C:/Promitor/metrics-declaration.yaml:/config/metrics-declaration.yaml  \
                         --volume C:/Promitor/runtime-config.yaml:/config/runtime.yaml \
                         tomkerkhove/promitor-agent-scraper-ci:pr683

You can find a CI version of our Helm chart on hub.helm.sh

@tomkerkhove tomkerkhove merged commit e457df4 into tomkerkhove:master Aug 27, 2019
@adamconnelly adamconnelly deleted the multi-resource-metrics-config branch February 3, 2020 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants