Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Provide Azure Resource Graph rate limiting info & throttling status metrics #1737

Merged
merged 4 commits into from
Aug 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions changelog/content/experimental/unreleased.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ version:

- {{% tag added %}} Provide system metrics related to agent performance & resources ([docs](https://promitor.io/operations/#performance)
| [#341](https://github.com/tomkerkhove/promitor/issues/341))
- {{% tag added %}} Provide system metrics indicating ARM throttling status ([docs](https://promitor.io/operations/#azure-resource-manager-api---consumption--throttling)
| [#1738](https://github.com/tomkerkhove/promitor/issues/1738))

#### Resource Discovery

Expand All @@ -20,3 +22,7 @@ version:
| [#1716](https://github.com/tomkerkhove/promitor/issues/1716))
- {{% tag added %}} Provide system metrics with discovered resource group information ([docs](https://promitor.io/operations/#discovery))
| [#1716](https://github.com/tomkerkhove/promitor/issues/1716))
- {{% tag added %}} Provide system metrics indicating Azure Resource Graph throttling status ([docs](https://promitor.io/operations/#azure-resource-graph)
| [#1739](https://github.com/tomkerkhove/promitor/issues/1739))
- {{% tag added %}} Provide system metrics providing insights on Azure Resource Graph rate limiting ([docs](https://promitor.io/operations/#azure-resource-graph)
| [#973](https://github.com/tomkerkhove/promitor/issues/973))
51 changes: 51 additions & 0 deletions docs/operations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Here is an overview of how you can operate Promitor.
- [Exploring our REST APIs](#exploring-our-rest-apis)
- [Integrations](#integrations)
- [Azure Resource Manager API - Consumption & Throttling](#azure-resource-manager-api---consumption--throttling)
- [Azure Resource Graph](#azure-resource-graph)
- [Azure Monitor](#azure-monitor)

## Health
Expand Down Expand Up @@ -212,8 +213,58 @@ Azure Resource Manager API:
- `subscription_id` - _Id of the subscription that is being interacted with_
- `app_id` - _Id of the application that is being used to interact with API_

- `promitor_ratelimit_arm_throttled` - Indication whether or not we are being throttled by Azure Resource Manager
(ARM). Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `subscription_id` - _Id of the subscription that is being interacted with_
- `app_id` - _Id of the application that is being used to interact with API_

```text
# HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager (ARM) is going to throttle us.
# TYPE promitor_ratelimit_arm gauge
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0329dd2a-59dc-4493-aa54-cb01cb027dc2",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11995 1629719527020
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11989 1629719532626
# HELP promitor_ratelimit_arm_throttled Indication concerning Azure Resource Manager are being throttled. (1 = yes, 0 = no).
# TYPE promitor_ratelimit_arm_throttled gauge
promitor_ratelimit_arm_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0329dd2a-59dc-4493-aa54-cb01cb027dc2",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 0 1629719527020
promitor_ratelimit_arm_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 0 1629719532626
```

You can read more about the Azure Resource Manager limitations on [docs.microsoft.com](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits).

### Azure Resource Graph

![Resource Discovery Support Badge](https://img.shields.io/badge/Support%20for%20Resource%20Discovery-Yes-green.svg)
![Scraper Support Badge](https://img.shields.io/badge/Support%20for%20Scraper-No-red.svg)

Promitor exposes runtime metrics to provide insights on the API consumption of
Azure Resource Graph:

- `promitor_ratelimit_resource_graph_remaining` - Indication how many calls are still available before
Azure Resource Manager is going to throttle us. Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `cloud` - _Name of the cloud_
- `auth_mode` - _Authentication mode to authenticate with_
- `app_id` - _Id of the application that is being used to interact with_

- `promitor_ratelimit_resource_graph_throttled` - Indication whether or not we are being throttled by Azure Resource
Graph. Metric provides following labels:
- `tenant_id` - _Id of the tenant that is being interacted with_
- `cloud` - _Name of the cloud_
- `auth_mode` - _Authentication mode to authenticate with_
- `app_id` - _Id of the application that is being used to interact with_

```text
# HELP promitor_ratelimit_resource_graph_remaining Indication how many calls are still available before Azure Resource Graph is going to throttle us.
# TYPE promitor_ratelimit_resource_graph_remaining gauge
promitor_ratelimit_resource_graph_remaining{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",cloud="Global",auth_mode="ServicePrincipal",app_id="67882a00-21d3-4ee7-b32a-430ea0768cd3"} 9 1629719863738
# HELP promitor_ratelimit_resource_graph_throttled Indication concerning Azure Resource Graph are being throttled. (1 = yes, 0 = no).
# TYPE promitor_ratelimit_resource_graph_throttled gauge
promitor_ratelimit_resource_graph_throttled{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",cloud="Global",auth_mode="ServicePrincipal",app_id="67882a00-21d3-4ee7-b32a-430ea0768cd3"} 0 1629719863738
```

You can read more about the Azure Resource Graph throttling on [docs.microsoft.com](https://docs.microsoft.com/en-us/azure/governance/resource-graph/overview#throttling).

### Azure Monitor

Promitor interacts with Azure Monitor API to scrape all the required metrics.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
using System.Collections.Generic;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using GuardNet;
using Microsoft.Extensions.Logging;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;

namespace Promitor.Agents.Core.RequestHandlers
{
public abstract class ThrottlingRequestHandler : DelegatingHandler
{
public abstract string DependencyName { get; }

protected ILogger Logger { get; }
protected IPrometheusMetricsCollector PrometheusMetricsCollector { get; }

/// <summary>
/// Constructor
/// </summary>
/// <param name="prometheusMetricsCollector">Metrics collector for Prometheus</param>
/// <param name="logger">Logger to write telemetry to</param>
protected ThrottlingRequestHandler(IPrometheusMetricsCollector prometheusMetricsCollector, ILogger logger)
{
Guard.NotNull(prometheusMetricsCollector, nameof(prometheusMetricsCollector));
Guard.NotNull(logger, nameof(logger));

Logger = logger;
PrometheusMetricsCollector = prometheusMetricsCollector;
}

protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
request = BeforeSendingRequest(request);

var response = await base.SendAsync(request, cancellationToken);

var wasRequestThrottled = (int)response.StatusCode == 429;
if (wasRequestThrottled)
{
LogArmThrottling();
}

await AvailableRateLimitingCallsAsync(response);
AvailableThrottlingStatusAsync(wasRequestThrottled);

return response;
}

private void AvailableThrottlingStatusAsync(bool wasRequestThrottled)
{
var metricValue = wasRequestThrottled ? 1 : 0;
var metricLabels = GetMetricLabels();
PrometheusMetricsCollector.WriteGaugeMeasurement(GetThrottlingStatusMetricName(), GetThrottlingStatusMetricDescription(), metricValue, metricLabels, includeTimestamp: true);
}

protected abstract Dictionary<string, string> GetMetricLabels();
protected abstract string GetThrottlingStatusMetricName();
protected abstract string GetThrottlingStatusMetricDescription();
protected abstract Task AvailableRateLimitingCallsAsync(HttpResponseMessage response);

protected virtual HttpRequestMessage BeforeSendingRequest(HttpRequestMessage request)
{
return request;
}

protected void LogArmThrottling()
{
Logger.LogWarning($"{DependencyName} rate limit reached.");
}
}
}
8 changes: 8 additions & 0 deletions src/Promitor.Agents.ResourceDiscovery/Docs/Open-Api.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,18 @@
using Promitor.Agents.ResourceDiscovery.Graph.Exceptions;
using Promitor.Agents.ResourceDiscovery.Graph.Interfaces;
using Promitor.Agents.ResourceDiscovery.Graph.Model;
using Promitor.Agents.ResourceDiscovery.Graph.RequestHandlers;
using Promitor.Core;
using Promitor.Core.Extensions;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;
using Promitor.Integrations.Azure.Authentication;

namespace Promitor.Agents.ResourceDiscovery.Graph
{
public class AzureResourceGraph : IAzureResourceGraph
{
private readonly IOptionsMonitor<ResourceDeclaration> _resourceDeclarationMonitor;
private readonly IPrometheusMetricsCollector _prometheusMetricsCollector;
private readonly ILogger<AzureResourceGraph> _logger;

private ResourceGraphClient _graphClient;
Expand All @@ -39,8 +42,9 @@ public class AzureResourceGraph : IAzureResourceGraph

private readonly AzureAuthenticationInfo _azureAuthenticationInfo;

public AzureResourceGraph(IOptionsMonitor<ResourceDeclaration> resourceDeclarationMonitor, IConfiguration configuration, ILogger<AzureResourceGraph> logger)
public AzureResourceGraph(IPrometheusMetricsCollector prometheusMetricsCollector, IOptionsMonitor<ResourceDeclaration> resourceDeclarationMonitor, IConfiguration configuration, ILogger<AzureResourceGraph> logger)
{
Guard.NotNull(prometheusMetricsCollector, nameof(prometheusMetricsCollector));
Guard.NotNull(resourceDeclarationMonitor, nameof(resourceDeclarationMonitor));
Guard.NotNull(resourceDeclarationMonitor.CurrentValue, nameof(resourceDeclarationMonitor.CurrentValue));
Guard.NotNull(resourceDeclarationMonitor.CurrentValue.AzureLandscape, nameof(resourceDeclarationMonitor.CurrentValue.AzureLandscape));
Expand All @@ -49,6 +53,7 @@ public AzureResourceGraph(IOptionsMonitor<ResourceDeclaration> resourceDeclarati

_logger = logger;
_resourceDeclarationMonitor = resourceDeclarationMonitor;
_prometheusMetricsCollector = prometheusMetricsCollector;
_azureAuthenticationInfo = AzureAuthenticationFactory.GetConfiguredAzureAuthentication(configuration);
}

Expand Down Expand Up @@ -265,7 +270,14 @@ private async Task<ResourceGraphClient> CreateClientAsync()
var credentials = await AzureAuthenticationFactory.GetTokenCredentialsAsync(azureEnvironment.ManagementEndpoint, TenantId, _azureAuthenticationInfo, azureAuthorityHost);
var resourceManagerBaseUri = new Uri(azureEnvironment.ResourceManagerEndpoint);

var resourceGraphClient = new ResourceGraphClient(resourceManagerBaseUri, credentials);
var metricLabels = new Dictionary<string, string>
{
{"tenant_id", TenantId},
{"cloud", azureEnvironment.GetDisplayName()},
{"app_id", _azureAuthenticationInfo.IdentityId},
{"auth_mode", _azureAuthenticationInfo.Mode.ToString()},
};
var resourceGraphClient = new ResourceGraphClient(resourceManagerBaseUri, credentials, new AzureResourceGraphThrottlingRequestHandler(_prometheusMetricsCollector, metricLabels, _logger));

var version = Promitor.Core.Version.Get();
var promitorUserAgent = UserAgent.Generate("Resource-Discovery", version);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading.Tasks;
using GuardNet;
using Microsoft.Extensions.Logging;
using Promitor.Agents.Core.RequestHandlers;
using Promitor.Core;
using Promitor.Core.Metrics.Prometheus.Collectors.Interfaces;

namespace Promitor.Agents.ResourceDiscovery.Graph.RequestHandlers
{
internal class AzureResourceGraphThrottlingRequestHandler : ThrottlingRequestHandler
{
private readonly Dictionary<string, string> _metricLabels;

private const string ThrottlingHeaderName = "x-ms-user-quota-remaining";
private const string AvailableCallsMetricDescription = "Indication how many calls are still available before Azure Resource Graph is going to throttle us.";
private const string ThrottledMetricDescription = "Indication concerning Azure Resource Graph are being throttled. (1 = yes, 0 = no).";

public override string DependencyName => "Azure Resource Graph";

/// <summary>
/// Constructor
/// </summary>
/// <param name="prometheusMetricsCollector">Metrics collector to write metrics to Prometheus</param>
/// <param name="metricLabels"></param>
/// <param name="logger">Logger to write telemetry to</param>
public AzureResourceGraphThrottlingRequestHandler(IPrometheusMetricsCollector prometheusMetricsCollector, Dictionary<string, string> metricLabels, ILogger logger)
: base(prometheusMetricsCollector, logger)
{
Guard.NotNull(metricLabels, nameof(metricLabels));

_metricLabels = metricLabels;
}

protected override Task AvailableRateLimitingCallsAsync(HttpResponseMessage response)
{
// Source:
// - https://docs.microsoft.com/en-us/azure/governance/resource-graph/overview#throttling
// - https://docs.microsoft.com/en-us/azure/governance/resource-graph/concepts/guidance-for-throttled-requests#understand-throttling-headers
if (response.Headers.Contains(ThrottlingHeaderName))
{
var remainingApiCalls = response.Headers.GetValues(ThrottlingHeaderName).FirstOrDefault();
var subscriptionReadLimit = Convert.ToInt16(remainingApiCalls);

// Report metric
PrometheusMetricsCollector.WriteGaugeMeasurement(RuntimeMetricNames.RateLimitingForResourceGraph, AvailableCallsMetricDescription, subscriptionReadLimit, _metricLabels, includeTimestamp: true);
}

return Task.CompletedTask;
}

protected override Dictionary<string, string> GetMetricLabels() => _metricLabels;
protected override string GetThrottlingStatusMetricName() => RuntimeMetricNames.ResourceGraphThrottled;
protected override string GetThrottlingStatusMetricDescription() => ThrottledMetricDescription;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,14 @@ public async Task ExecuteAsync(CancellationToken cancellationToken)

private void ReportDiscoveredAzureInfo(AzureResourceGroupInformation resourceGroupInformation)
{
var managedByLabel = string.IsNullOrWhiteSpace(resourceGroupInformation.ManagedBy) ? "n/a" : resourceGroupInformation.ManagedBy;
var labels = new Dictionary<string, string>
{
{ "tenant_id", resourceGroupInformation.TenantId },
{ "subscription_id", resourceGroupInformation.SubscriptionId },
{ "resource_group_name", resourceGroupInformation.Name },
{ "provisioning_state", resourceGroupInformation.ProvisioningState },
{ "managed_by", managedByLabel },
{ "region", resourceGroupInformation.Region }
{ "provisioning_state", GetValueOrDefault(resourceGroupInformation.ProvisioningState, "n/a") },
{ "managed_by", GetValueOrDefault(resourceGroupInformation.ManagedBy, "n/a") },
{ "region", GetValueOrDefault(resourceGroupInformation.Region, "n/a") }
};

// Report metric in Prometheus endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ private void ReportDiscoveredAzureInfo(AzureSubscriptionInformation azureLandsca
{ "tenant_id", azureLandscapeInformation.TenantId },
{ "subscription_id", azureLandscapeInformation.Id },
{ "subscription_name", azureLandscapeInformation.Name},
{ "quota_id", azureLandscapeInformation.QuotaId},
{ "spending_limit", azureLandscapeInformation.SpendingLimit},
{ "state", azureLandscapeInformation.State},
{ "authorization", azureLandscapeInformation.AuthorizationSource}
{ "quota_id", GetValueOrDefault(azureLandscapeInformation.QuotaId, "n/a")},
{ "spending_limit", GetValueOrDefault(azureLandscapeInformation.SpendingLimit, "n/a")},
{ "state", GetValueOrDefault(azureLandscapeInformation.State, "n/a")},
{ "authorization", GetValueOrDefault(azureLandscapeInformation.AuthorizationSource, "n/a")}
};

// Report metric in Prometheus endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,15 @@ protected void WritePrometheusMetric(string metricName, string metricDescription
{
_prometheusMetricsCollector.WriteGaugeMeasurement(metricName, metricDescription, value, labels, includeTimestamp: true);
}

protected string GetValueOrDefault(string preferredValue, string alternative)
{
if (string.IsNullOrWhiteSpace(preferredValue))
{
return alternative;
}

return preferredValue;
}
}
}
18 changes: 18 additions & 0 deletions src/Promitor.Core/Extensions/AzureEnvironmentExtensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
using Humanizer;
using Microsoft.Azure.Management.ResourceManager.Fluent;

namespace Promitor.Core.Extensions
{
public static class AzureEnvironmentExtensions
{
/// <summary>
/// Get Azure environment information
/// </summary>
/// <param name="azureCloud">Microsoft Azure cloud</param>
/// <returns>Azure environment information for specified cloud</returns>
public static string GetDisplayName(this AzureEnvironment azureCloud)
{
return azureCloud.Name.Replace("Azure", "").Replace("Cloud", "").Humanize(LetterCasing.Title);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Promitor.Core.Metrics.Prometheus.Collectors.Interfaces
{
public interface IAzureScrapingPrometheusMetricsCollector
public interface IAzureScrapingPrometheusMetricsCollector : IPrometheusMetricsCollector
{
/// <summary>
/// Sets a new value for a measurement on a gauge
Expand Down
Loading