Skip to content

Commit

Permalink
feat: Provide Azure Resource Graph rate limiting info & throttling st…
Browse files Browse the repository at this point in the history
…atus metrics (#1737)
  • Loading branch information
tomkerkhove authored Aug 24, 2021
1 parent 6169e1a commit 0b80fea
Show file tree
Hide file tree
Showing 18 changed files with 409 additions and 39 deletions.
6 changes: 6 additions & 0 deletions changelog/content/experimental/unreleased.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ version:

- {{% tag added %}} Provide system metrics related to agent performance & resources ([docs](https://promitor.io/operations/#performance)
| [#341](https://github.com/tomkerkhove/promitor/issues/341))
- {{% tag added %}} Provide system metrics indicating ARM throttling status ([docs](https://promitor.io/operations/#azure-resource-manager-api---consumption--throttling)
| [#1738](https://github.com/tomkerkhove/promitor/issues/1738))

#### Resource Discovery

Expand All @@ -20,3 +22,7 @@ version:
| [#1716](https://github.com/tomkerkhove/promitor/issues/1716))
- {{% tag added %}} Provide system metrics with discovered resource group information ([docs](https://promitor.io/operations/#discovery))
| [#1716](https://github.com/tomkerkhove/promitor/issues/1716))
- {{% tag added %}} Provide system metrics indicating Azure Resource Graph throttling status ([docs](https://promitor.io/operations/#azure-resource-graph)
| [#1739](https://github.com/tomkerkhove/promitor/issues/1739))
- {{% tag added %}} Provide system metrics providing insights on Azure Resource Graph rate limiting ([docs](https://promitor.io/operations/#azure-resource-graph)
| [#973](https://github.com/tomkerkhove/promitor/issues/973))
51 changes: 51 additions & 0 deletions docs/operations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Here is an overview of how you can operate Promitor.
- [Exploring our REST APIs](#exploring-our-rest-apis)
- [Integrations](#integrations)
- [Azure Resource Manager API - Consumption & Throttling](#azure-resource-manager-api---consumption--throttling)
- [Azure Resource Graph](#azure-resource-graph)
- [Azure Monitor](#azure-monitor)

## Health
Expand Down Expand Up @@ -212,8 +213,58 @@ Azure Resource Manager API:
- `subscription_id` - _Id of the subscription that is being interacted with_
- `app_id` - _Id of the application that is being used to interact with API_

- `promitor_ratelimit_arm_throttled` - Indication whether or not we are being throttled by Azure Resource Manager
(ARM). Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `subscription_id` - _Id of the subscription that is being interacted with_
- `app_id` - _Id of the application that is being used to interact with API_

```text
# HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager (ARM) is going to throttle us.
# TYPE promitor_ratelimit_arm gauge
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0329dd2a-59dc-4493-aa54-cb01cb027dc2",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11995 1629719527020
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11989 1629719532626
# HELP promitor_ratelimit_arm_throttled Indication concerning Azure Resource Manager are being throttled. (1 = yes, 0 = no).
# TYPE promitor_ratelimit_arm_throttled gauge
promitor_ratelimit_arm_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0329dd2a-59dc-4493-aa54-cb01cb027dc2",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 0 1629719527020
promitor_ratelimit_arm_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 0 1629719532626
```

You can read more about the Azure Resource Manager limitations on [docs.microsoft.com](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits).

### Azure Resource Graph

![Resource Discovery Support Badge](https://img.shields.io/badge/Support%20for%20Resource%20Discovery-Yes-green.svg)
![Scraper Support Badge](https://img.shields.io/badge/Support%20for%20Scraper-No-red.svg)

Promitor exposes runtime metrics to provide insights on the API consumption of
Azure Resource Graph:

- `promitor_ratelimit_resource_graph_remaining` - Indication how many calls are still available before
Azure Resource Manager is going to throttle us. Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `cloud` - _Name of the cloud_
- `auth_mode` - _Authentication mode to authenticate with_
- `app_id` - _Id of the application that is being used to interact with_

- `promitor_ratelimit_resource_graph_throttled` - Indication whether or not we are being throttled by Azure Resource
Graph. Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `cloud` - _Name of the cloud_
- `auth_mode` - _Authentication mode to authenticate with_
- `app_id` - _Id of the application that is being used to interact with_

```text
# HELP promitor_ratelimit_resource_graph_remaining Indication how many calls are still available before Azure Resource Graph is going to throttle us.
# TYPE promitor_ratelimit_resource_graph_remaining gauge
promitor_ratelimit_resource_graph_remaining{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",cloud="Global",auth_mode="ServicePrincipal",app_id="67882a00-21d3-4ee7-b32a-430ea0768cd3"} 9 1629719863738
# HELP promitor_ratelimit_resource_graph_throttled Indication concerning Azure Resource Graph are being throttled. (1 = yes, 0 = no).
# TYPE promitor_ratelimit_resource_graph_throttled gauge
promitor_ratelimit_resource_graph_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",cloud="Global",auth_mode="ServicePrincipal",app_id="67882a00-21d3-4ee7-b32a-430ea0768cd3"} 0 1629719863738
```

You can read more about the Azure Resource Graph throttling on [docs.microsoft.com](https://docs.microsoft.com/en-us/azure/governance/resource-graph/overview#throttling).

### Azure Monitor

Promitor interacts with Azure Monitor API to scrape all the required metrics.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
using System.Collections.Generic;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using GuardNet;
using Microsoft.Extensions.Logging;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;

namespace Promitor.Agents.Core.RequestHandlers
{
public abstract class ThrottlingRequestHandler : DelegatingHandler
{
public abstract string DependencyName { get; }

protected ILogger Logger { get; }
protected IPrometheusMetricsCollector PrometheusMetricsCollector { get; }

/// <summary>
/// Constructor
/// </summary>
/// <param name="prometheusMetricsCollector">Metrics collector for Prometheus</param>
/// <param name="logger">Logger to write telemetry to</param>
protected ThrottlingRequestHandler(IPrometheusMetricsCollector prometheusMetricsCollector, ILogger logger)
{
Guard.NotNull(prometheusMetricsCollector, nameof(prometheusMetricsCollector));
Guard.NotNull(logger, nameof(logger));

Logger = logger;
PrometheusMetricsCollector = prometheusMetricsCollector;
}

protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
request = BeforeSendingRequest(request);

var response = await base.SendAsync(request, cancellationToken);

var wasRequestThrottled = (int)response.StatusCode == 429;
if (wasRequestThrottled)
{
LogArmThrottling();
}

await AvailableRateLimitingCallsAsync(response);
AvailableThrottlingStatusAsync(wasRequestThrottled);

return response;
}

private void AvailableThrottlingStatusAsync(bool wasRequestThrottled)
{
var metricValue = wasRequestThrottled ? 1 : 0;
var metricLabels = GetMetricLabels();
PrometheusMetricsCollector.WriteGaugeMeasurement(GetThrottlingStatusMetricName(), GetThrottlingStatusMetricDescription(), metricValue, metricLabels, includeTimestamp: true);
}

protected abstract Dictionary<string, string> GetMetricLabels();
protected abstract string GetThrottlingStatusMetricName();
protected abstract string GetThrottlingStatusMetricDescription();
protected abstract Task AvailableRateLimitingCallsAsync(HttpResponseMessage response);

protected virtual HttpRequestMessage BeforeSendingRequest(HttpRequestMessage request)
{
return request;
}

protected void LogArmThrottling()
{
Logger.LogWarning($"{DependencyName} rate limit reached.");
}
}
}
8 changes: 8 additions & 0 deletions src/Promitor.Agents.ResourceDiscovery/Docs/Open-Api.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 14 additions & 2 deletions src/Promitor.Agents.ResourceDiscovery/Graph/AzureResourceGraph.cs
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,18 @@
using Promitor.Agents.ResourceDiscovery.Graph.Exceptions;
using Promitor.Agents.ResourceDiscovery.Graph.Interfaces;
using Promitor.Agents.ResourceDiscovery.Graph.Model;
using Promitor.Agents.ResourceDiscovery.Graph.RequestHandlers;
using Promitor.Core;
using Promitor.Core.Extensions;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;
using Promitor.Integrations.Azure.Authentication;

namespace Promitor.Agents.ResourceDiscovery.Graph
{
public class AzureResourceGraph : IAzureResourceGraph
{
private readonly IOptionsMonitor<ResourceDeclaration> _resourceDeclarationMonitor;
private readonly IPrometheusMetricsCollector _prometheusMetricsCollector;
private readonly ILogger<AzureResourceGraph> _logger;

private ResourceGraphClient _graphClient;
Expand All @@ -39,8 +42,9 @@ public class AzureResourceGraph : IAzureResourceGraph

private readonly AzureAuthenticationInfo _azureAuthenticationInfo;

public AzureResourceGraph(IOptionsMonitor<ResourceDeclaration> resourceDeclarationMonitor, IConfiguration configuration, ILogger<AzureResourceGraph> logger)
public AzureResourceGraph(IPrometheusMetricsCollector prometheusMetricsCollector, IOptionsMonitor<ResourceDeclaration> resourceDeclarationMonitor, IConfiguration configuration, ILogger<AzureResourceGraph> logger)
{
Guard.NotNull(prometheusMetricsCollector, nameof(prometheusMetricsCollector));
Guard.NotNull(resourceDeclarationMonitor, nameof(resourceDeclarationMonitor));
Guard.NotNull(resourceDeclarationMonitor.CurrentValue, nameof(resourceDeclarationMonitor.CurrentValue));
Guard.NotNull(resourceDeclarationMonitor.CurrentValue.AzureLandscape, nameof(resourceDeclarationMonitor.CurrentValue.AzureLandscape));
Expand All @@ -49,6 +53,7 @@ public AzureResourceGraph(IOptionsMonitor<ResourceDeclaration> resourceDeclarati

_logger = logger;
_resourceDeclarationMonitor = resourceDeclarationMonitor;
_prometheusMetricsCollector = prometheusMetricsCollector;
_azureAuthenticationInfo = AzureAuthenticationFactory.GetConfiguredAzureAuthentication(configuration);
}

Expand Down Expand Up @@ -265,7 +270,14 @@ private async Task<ResourceGraphClient> CreateClientAsync()
var credentials = await AzureAuthenticationFactory.GetTokenCredentialsAsync(azureEnvironment.ManagementEndpoint, TenantId, _azureAuthenticationInfo, azureAuthorityHost);
var resourceManagerBaseUri = new Uri(azureEnvironment.ResourceManagerEndpoint);

var resourceGraphClient = new ResourceGraphClient(resourceManagerBaseUri, credentials);
var metricLabels = new Dictionary<string, string>
{
{"tenant_id", TenantId},
{"cloud", azureEnvironment.GetDisplayName()},
{"app_id", _azureAuthenticationInfo.IdentityId},
{"auth_mode", _azureAuthenticationInfo.Mode.ToString()},
};
var resourceGraphClient = new ResourceGraphClient(resourceManagerBaseUri, credentials, new AzureResourceGraphThrottlingRequestHandler(_prometheusMetricsCollector, metricLabels, _logger));

var version = Promitor.Core.Version.Get();
var promitorUserAgent = UserAgent.Generate("Resource-Discovery", version);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading.Tasks;
using GuardNet;
using Microsoft.Extensions.Logging;
using Promitor.Agents.Core.RequestHandlers;
using Promitor.Core;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;

namespace Promitor.Agents.ResourceDiscovery.Graph.RequestHandlers
{
internal class AzureResourceGraphThrottlingRequestHandler : ThrottlingRequestHandler
{
private readonly Dictionary<string, string> _metricLabels;

private const string ThrottlingHeaderName = "x-ms-user-quota-remaining";
private const string AvailableCallsMetricDescription = "Indication how many calls are still available before Azure Resource Graph is going to throttle us.";
private const string ThrottledMetricDescription = "Indication concerning Azure Resource Graph are being throttled. (1 = yes, 0 = no).";

public override string DependencyName => "Azure Resource Graph";

/// <summary>
/// Constructor
/// </summary>
/// <param name="prometheusMetricsCollector">Metrics collector to write metrics to Prometheus</param>
/// <param name="metricLabels"></param>
/// <param name="logger">Logger to write telemetry to</param>
public AzureResourceGraphThrottlingRequestHandler(IPrometheusMetricsCollector prometheusMetricsCollector, Dictionary<string, string> metricLabels, ILogger logger)
: base(prometheusMetricsCollector, logger)
{
Guard.NotNull(metricLabels, nameof(metricLabels));

_metricLabels = metricLabels;
}

protected override Task AvailableRateLimitingCallsAsync(HttpResponseMessage response)
{
// Source:
// - https://docs.microsoft.com/en-us/azure/governance/resource-graph/overview#throttling
// - https://docs.microsoft.com/en-us/azure/governance/resource-graph/concepts/guidance-for-throttled-requests#understand-throttling-headers
if (response.Headers.Contains(ThrottlingHeaderName))
{
var remainingApiCalls = response.Headers.GetValues(ThrottlingHeaderName).FirstOrDefault();
var subscriptionReadLimit = Convert.ToInt16(remainingApiCalls);

// Report metric
PrometheusMetricsCollector.WriteGaugeMeasurement(RuntimeMetricNames.RateLimitingForResourceGraph, AvailableCallsMetricDescription, subscriptionReadLimit, _metricLabels, includeTimestamp: true);
}

return Task.CompletedTask;
}

protected override Dictionary<string, string> GetMetricLabels() => _metricLabels;
protected override string GetThrottlingStatusMetricName() => RuntimeMetricNames.ResourceGraphThrottled;
protected override string GetThrottlingStatusMetricDescription() => ThrottledMetricDescription;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,14 @@ public async Task ExecuteAsync(CancellationToken cancellationToken)

private void ReportDiscoveredAzureInfo(AzureResourceGroupInformation resourceGroupInformation)
{
var managedByLabel = string.IsNullOrWhiteSpace(resourceGroupInformation.ManagedBy) ? "n/a" : resourceGroupInformation.ManagedBy;
var labels = new Dictionary<string, string>
{
{ "tenant_id", resourceGroupInformation.TenantId },
{ "subscription_id", resourceGroupInformation.SubscriptionId },
{ "resource_group_name", resourceGroupInformation.Name },
{ "provisioning_state", resourceGroupInformation.ProvisioningState },
{ "managed_by", managedByLabel },
{ "region", resourceGroupInformation.Region }
{ "provisioning_state", GetValueOrDefault(resourceGroupInformation.ProvisioningState, "n/a") },
{ "managed_by", GetValueOrDefault(resourceGroupInformation.ManagedBy, "n/a") },
{ "region", GetValueOrDefault(resourceGroupInformation.Region, "n/a") }
};

// Report metric in Prometheus endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ private void ReportDiscoveredAzureInfo(AzureSubscriptionInformation azureLandsca
{ "tenant_id", azureLandscapeInformation.TenantId },
{ "subscription_id", azureLandscapeInformation.Id },
{ "subscription_name", azureLandscapeInformation.Name},
{ "quota_id", azureLandscapeInformation.QuotaId},
{ "spending_limit", azureLandscapeInformation.SpendingLimit},
{ "state", azureLandscapeInformation.State},
{ "authorization", azureLandscapeInformation.AuthorizationSource}
{ "quota_id", GetValueOrDefault(azureLandscapeInformation.QuotaId, "n/a")},
{ "spending_limit", GetValueOrDefault(azureLandscapeInformation.SpendingLimit, "n/a")},
{ "state", GetValueOrDefault(azureLandscapeInformation.State, "n/a")},
{ "authorization", GetValueOrDefault(azureLandscapeInformation.AuthorizationSource, "n/a")}
};

// Report metric in Prometheus endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,15 @@ protected void WritePrometheusMetric(string metricName, string metricDescription
{
_prometheusMetricsCollector.WriteGaugeMeasurement(metricName, metricDescription, value, labels, includeTimestamp: true);
}

protected string GetValueOrDefault(string preferredValue, string alternative)
{
if (string.IsNullOrWhiteSpace(preferredValue))
{
return alternative;
}

return preferredValue;
}
}
}
18 changes: 18 additions & 0 deletions src/Promitor.Core/Extensions/AzureEnvironmentExtensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
using Humanizer;
using Microsoft.Azure.Management.ResourceManager.Fluent;

namespace Promitor.Core.Extensions
{
public static class AzureEnvironmentExtensions
{
/// <summary>
/// Get Azure environment information
/// </summary>
/// <param name="azureCloud">Microsoft Azure cloud</param>
/// <returns>Azure environment information for specified cloud</returns>
public static string GetDisplayName(this AzureEnvironment azureCloud)
{
return azureCloud.Name.Replace("Azure", "").Replace("Cloud", "").Humanize(LetterCasing.Title);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Promitor.Core.Metrics.Prometheus.Collectors.Interfaces
{
public interface IAzureScrapingPrometheusMetricsCollector
public interface IAzureScrapingPrometheusMetricsCollector : IPrometheusMetricsCollector
{
/// <summary>
/// Sets a new value for a measurement on a gauge
Expand Down
Loading

0 comments on commit 0b80fea

Please sign in to comment.