Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to v0.5.0 breaks writing some metrics #20

Closed
tomkerkhove opened this issue Jul 4, 2021 · 24 comments
Closed

Upgrading to v0.5.0 breaks writing some metrics #20

tomkerkhove opened this issue Jul 4, 2021 · 24 comments
Assignees
Labels

Comments

@tomkerkhove
Copy link

I've had issues where only a subset of metrics were emitted:

# HELP promitor_runtime_http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE promitor_runtime_http_request_duration_seconds histogram
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.005"} 2
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.01"} 2
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.025"} 2
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.05"} 3
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.075"} 3
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.1"} 3
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.25"} 5
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.5"} 5
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.75"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="1"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="2.5"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="5"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="7.5"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="10"} 6
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="+Inf"} 6
promitor_runtime_http_request_duration_seconds_sum{status_code="200",method="GET",path="/scrape"} 0.9737883
promitor_runtime_http_request_duration_seconds_count{status_code="200",method="GET",path="/scrape"} 6
# HELP promitor_scrape_error Provides an indication that the scraping of the resource has failed
# TYPE promitor_scrape_error gauge
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_runs"} 0 1625405461479
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="MonitorAutoscale",resource_name="app-service-autoscaling-rules",resource_group="demo",metric_name="promitor_demo_appplan_autoscale_observed_capacity"} 0 1625405463492
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="FrontDoor",resource_name="promitor-landscape",resource_group="promitor-landscape",metric_name="promitor_demo_frontdoor_backend_health_per_backend"} 1 1625405457335
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration_200_OK"} 1 1625405463515
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration"} 1 1625405457465
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AppPlan",resource_name="promitor-app-plan",resource_group="promitor-sources",metric_name="promitor_demo_appplan_percentage_cpu"} 1 1625405457465
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_machine_runs"} 0 1625405461410
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_job_count"} 0 1625405461513
# HELP promitor_scrape_success Provides an indication that the scraping of the resource was successful
# TYPE promitor_scrape_success gauge
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_runs"} 1 1625405460976
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="MonitorAutoscale",resource_name="app-service-autoscaling-rules",resource_group="demo",metric_name="promitor_demo_appplan_autoscale_observed_capacity"} 1 1625405463296
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="FrontDoor",resource_name="promitor-landscape",resource_group="promitor-landscape",metric_name="promitor_demo_frontdoor_backend_health_per_backend"} 0 1625405456969
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration_200_OK"} 0 1625405463347
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration"} 0 1625405456968
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AppPlan",resource_name="promitor-app-plan",resource_group="promitor-sources",metric_name="promitor_demo_appplan_percentage_cpu"} 0 1625405456968
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_machine_runs"} 1 1625405460869
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_job_count"} 1 1625405461162

So I reverted back to v0.4.0 which gave me the full metrics again:

# HELP azure_logic_apps_failed_run Total amount of failed runs for Azure Logic Apps
# TYPE azure_logic_apps_failed_run gauge
azure_logic_apps_failed_run{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.Logic/workflows/promitor-automation-github-ci-scraper",resource_group="promitor",instance_name="promitor-automation-github-ci-scraper",geo="china",environment="dev"} 0 1625405275265
# HELP promitor_demo_appplan_autoscale_observed_capacity Average amount of current instances for an Azure App Plan with Azure Monitor Autoscale
# TYPE promitor_demo_appplan_autoscale_observed_capacity gauge
promitor_demo_appplan_autoscale_observed_capacity{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/demo/providers/Microsoft.Insights/autoscalesettings/app-service-autoscaling-rules",resource_group="demo",metrictriggerrule="unknown",instance_name="app-service-autoscaling-rules",geo="china",environment="dev",app="promitor"} -1 1625405275052
# HELP promitor_demo_automation_job_count Amount of jobs per Azure Automation account
# TYPE promitor_demo_automation_job_count gauge
promitor_demo_automation_job_count{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor-sources/providers/Microsoft.Automation/automationAccounts/promitor-sandbox",resource_group="promitor-sources",instance_name="promitor-sandbox",geo="china",environment="dev"} -1 1625405275266
# HELP promitor_demo_automation_update_deployment_machine_runs Amount of jobs per Azure Automation account
# TYPE promitor_demo_automation_update_deployment_machine_runs gauge
promitor_demo_automation_update_deployment_machine_runs{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor-sources/providers/Microsoft.Automation/automationAccounts/promitor-sandbox",resource_group="promitor-sources",instance_name="promitor-sandbox",geo="china",environment="dev"} -1 1625405275265
# HELP promitor_demo_automation_update_deployment_runs Amount of jobs per Azure Automation account
# TYPE promitor_demo_automation_update_deployment_runs gauge
promitor_demo_automation_update_deployment_runs{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor-sources/providers/Microsoft.Automation/automationAccounts/promitor-sandbox",resource_group="promitor-sources",instance_name="promitor-sandbox",geo="china",environment="dev"} -1 1625405275264
# HELP promitor_demo_servicebus_messagecount_discovered Average percentage of memory usage on an Azure App Plan
# TYPE promitor_demo_servicebus_messagecount_discovered gauge
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-07",app="promitor"} 0 1625405279176
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-15",app="promitor"} 0 1625405280144
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-03",app="promitor"} 2 1625405278677
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-12",app="promitor"} 0 1625405279803
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-05",app="promitor"} 0 1625405279325
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="shipment-requests",app="promitor"} 1 1625405278834
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-01",app="promitor"} 0 1625405279944
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-13",app="promitor"} 0 1625405279646
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-11",app="promitor"} 0 1625405279004
promitor_demo_servicebus_messagecount_discovered{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-08",app="promitor"} 0 1625405279478
# HELP promitor_demo_servicebus_messagecount_limited Average percentage of memory usage on an Azure App Plan
# TYPE promitor_demo_servicebus_messagecount_limited gauge
promitor_demo_servicebus_messagecount_limited{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-07",app="promitor"} 0 1625405278025
promitor_demo_servicebus_messagecount_limited{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-03",app="promitor"} 2 1625405277102
promitor_demo_servicebus_messagecount_limited{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-05",app="promitor"} 0 1625405277780
promitor_demo_servicebus_messagecount_limited{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="queue-12",app="promitor"} 0 1625405277568
promitor_demo_servicebus_messagecount_limited{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor/providers/Microsoft.ServiceBus/namespaces/promitor-messaging",resource_group="promitor",instance_name="promitor-messaging",geo="europe",environment="dev",entity_name="shipment-requests",app="promitor"} 1 1625405277329
# HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager is going to throttle us.
# TYPE promitor_ratelimit_arm gauge
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11991 1625405273416
# HELP promitor_runtime_http_request_duration_seconds duration histogram of http responses labeled with: status_code, method, path
# TYPE promitor_runtime_http_request_duration_seconds histogram
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.005"} 8
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.01"} 8
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.025"} 9
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.05"} 9
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.075"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.1"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.25"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.5"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="0.75"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="1"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="2.5"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="5"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="7.5"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="10"} 11
promitor_runtime_http_request_duration_seconds_bucket{status_code="200",method="GET",path="/scrape",le="+Inf"} 11
promitor_runtime_http_request_duration_seconds_sum{status_code="200",method="GET",path="/scrape"} 0.1317863
promitor_runtime_http_request_duration_seconds_count{status_code="200",method="GET",path="/scrape"} 11
# HELP promitor_scrape_error Provides an indication that the scraping of the resource has failed
# TYPE promitor_scrape_error gauge
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_job_count"} 0 1625405276354
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_machine_runs"} 0 1625405276333
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="FrontDoor",resource_name="promitor-landscape",resource_group="promitor-landscape",metric_name="promitor_demo_frontdoor_backend_health_per_backend"} 1 1625405272471
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration_200_OK"} 1 1625405272861
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AppPlan",resource_name="promitor-app-plan",resource_group="promitor-sources",metric_name="promitor_demo_appplan_percentage_cpu"} 1 1625405272860
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_runs"} 0 1625405276304
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration"} 1 1625405272833
promitor_scrape_error{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="MonitorAutoscale",resource_name="app-service-autoscaling-rules",resource_group="demo",metric_name="promitor_demo_appplan_autoscale_observed_capacity"} 0 1625405275862
# HELP promitor_scrape_success Provides an indication that the scraping of the resource was successful
# TYPE promitor_scrape_success gauge
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_job_count"} 1 1625405275895
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_machine_runs"} 1 1625405275898
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="FrontDoor",resource_name="promitor-landscape",resource_group="promitor-landscape",metric_name="promitor_demo_frontdoor_backend_health_per_backend"} 0 1625405271989
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration_200_OK"} 0 1625405272081
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AppPlan",resource_name="promitor-app-plan",resource_group="promitor-sources",metric_name="promitor_demo_appplan_percentage_cpu"} 0 1625405271989
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="AutomationAccount",resource_name="promitor-sandbox",resource_group="promitor-sources",metric_name="promitor_demo_automation_update_deployment_runs"} 1 1625405275892
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="Generic",resource_name="Microsoft.Insights/Components/docker-hub-metrics",resource_group="docker-hub-metrics",metric_name="promitor_demo_app_insights_dependency_duration"} 0 1625405271988
promitor_scrape_success{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_type="MonitorAutoscale",resource_name="app-service-autoscaling-rules",resource_group="demo",metric_name="promitor_demo_appplan_autoscale_observed_capacity"} 1 1625405275438

However, when doing the diff I cannot spot the issue: v0.4.0...v0.5.0

@phnx47 phnx47 self-assigned this Jul 5, 2021
@phnx47 phnx47 added the bug label Jul 5, 2021
@phnx47
Copy link
Member

phnx47 commented Jul 5, 2021

I will check it

@tomkerkhove
Copy link
Author

Thank you! I don't really get it but ok... 😐

@phnx47
Copy link
Member

phnx47 commented Jul 10, 2021

@tomkerkhove I downgraded Microsoft.Extensions.DependencyInjection.Abstractions to 3.1.16 in v.0.6.0 version. It should be resolve problems

@tomkerkhove
Copy link
Author

Thank you!

@phnx47
Copy link
Member

phnx47 commented Jul 12, 2021

@tomkerkhove I downgraded Microsoft.Extensions.DependencyInjection.Abstractions to 3.1.13. I don't know how to test promitor in local machine, but I still think, that problem with version-conflict

@tomkerkhove
Copy link
Author

I don't get it either and v0.6.2 doesn't work as well for me but I don't see why :/

@phnx47
Copy link
Member

phnx47 commented Jul 12, 2021

Another think, only about this:

Before v0.5.0

services.AddSingleton(new CollectorRegistry() as ICollectorRegistry); // in code different, but should same behaviour
services.AddSingleton<IMetricFactory, MetricFactory>();

After

services.AddSingleton<ICollectorRegistry, CollectorRegistry>();
services.AddSingleton<IMetricFactory, MetricFactory>();

@tomkerkhove
Copy link
Author

🤯

@sanych-sun
Copy link
Member

Hi @tomkerkhove! Is there any chance you start scrapping before configure/create all metrics? I've found a theoretical race condition, that could lead to a problems if a metric is being created while scrapping.

@sanych-sun
Copy link
Member

@tomkerkhove @phnx47 I think I can reproduce some similar problem in scenario when multiple threads trying to create a metrics while scrape is happening on another thread. I've created a unit-test to reproduce the issue and prepare a fix soon.

However the affected code was introduced more then a year ago, and this could be some similar but not the root cause for this exact problem.

@tomkerkhove
Copy link
Author

Hi @tomkerkhove! Is there any chance you start scrapping before configure/create all metrics? I've found a theoretical race condition, that could lead to a problems if a metric is being created while scrapping.

So the tricky thing is that this is where it's not working as it should:
https://github.com/tomkerkhove/promitor/blob/master/src/Promitor.Core.Scraping/Scraper.cs#L87-L89

The metrics inside ReportScrapingOutcome will be reported, but await _metricSinkWriter.ReportMetricAsync will not.

Looking in detail, they use the same approach though:

Both are configured with same lifecycle:

@sanych-sun
Copy link
Member

sanych-sun commented Jul 19, 2021

@tomkerkhove as we cannot reproduce the issue on our side, could you please help with investigation on the issue? It would be perfect to have something runnable to reproduce the issue on our side. Is there any way to create a promitor version configured to use some mocks so I can pull the version and debug a problem?

Or could you please try to bump Prometheus.Client libraries one at the time so we can isolate a problem:

  1. bump Prometheus.Client library to 4.5.2
  2. check if problem appears
  3. bump Prometheus.Prometheus.Client.DependencyInjection to 0.6.2
  4. check again.

Also I suspect the problem might be somewhere around of multiple collector registry creation (I have no ideas how, but if the issue is consistently reproducible on your side, it could mean that different parts of code uses different collector registry... somehow). To ensure that there is only one could you please create a component that injects IEnumerable and check how many object is being injected. And finally please check if there is any collectors registered into Metrics.DefaultCollectorRegistry

@tomkerkhove
Copy link
Author

I'm able to succesfully bump the client & ASP.NET version through tomkerkhove/promitor#1691 with good results.

However, when bumping the DI package to v0.5.0 or latest it is still failing. I'll try to dive deeper into this but the easiest way to run this yourself is through https://github.com/tomkerkhove/promitor/blob/master/development-guide.md#net-development

@tomkerkhove
Copy link
Author

I'm scratching my head on this one 😅

To ensure that there is only one could you please create a component that injects IEnumerable and check how many object is being injected. And finally please check if there is any collectors registered into Metrics.DefaultCollectorRegistry

Inject IEnumerable of what type exactly? Happy to give it a try.

@sanych-sun
Copy link
Member

sanych-sun commented Aug 2, 2021

HI @tomkerkhove , sorry it github trying to "help" me and hide tags in text :)

IEnumerable<ICollectorRegistry>

I'll try to run your code locally to debug the issue.

@tomkerkhove
Copy link
Author

I gave it a try and I always get 1 instance of the registry which eventually has multiple collectors.

The difference is:

  • The working approach injects the registry every time in the class
  • The not working version re-uses it for its entire lifetime

However, this is identical as how it was < v0.5

@sanych-sun
Copy link
Member

Magic =)

@tomkerkhove could you please let me know how can I debug the Promitor? As I've got the following error when I run the Scraper project:

Unable to determine the configuration folder. Please ensure that the 'PROMITOR_CONFIG_FOLDER' environment variable is set

@tomkerkhove
Copy link
Author

How did you run it? Through Docker compose?

@sanych-sun
Copy link
Member

Just by running the project. Is there any way to run it without Docker?

@tomkerkhove
Copy link
Author

Yes, but then you'll have to configure a few things 😅

This should help you get started: https://github.com/tomkerkhove/promitor/blob/master/development-guide.md#running-promitor

@sanych-sun
Copy link
Member

sanych-sun commented Sep 14, 2021

Hi @tomkerkhove ! Sorry for such long delay in updates, but I think I've finally found the root cause of the problem, that makes sense and perfectly explain what have happened.

The problem in the Promitor code that explicitly makes a call to services.BuildServiceProvider(). According to the documentation this call creates a very new DI container on each call:

Calling BuildServiceProvider creates a second container, which can create torn singletons and cause references to object graphs across multiple containers.

So it means the Promitor works at least with 2 DI containers: the one explicitly created by the code and the default one (created by asp.net runtime). It also means each singleton will be tracked separately in each container. So now it's obvious why simple changing from Register by istance to Register by type broke the metrics:

  • Register (by instance) we have singleton component that resolved to the pre-created instance (shared between both containers)
  • Register (by type) we have singleton component that should be created by container on runtime, and it produces different instances in each container.

I hope this makes sense.

So my suggestion here: remove the explicit creation of the container, as it affects not just Prometeus.Client infrastructure, but whole application lifetime logic.

Update: Probably the bootstrap logic that schedules the syncs should be moved into some new service implementing IHostedService, so it will be automatically started by runtime.

@tomkerkhove
Copy link
Author

Good point, didn't think about that but did it for a reason.

I'll check when I'm back from holiday and come back to it.

@tomkerkhove
Copy link
Author

Probably the bootstrap logic that schedules the syncs should be moved into some new service implementing IHostedService, so it will be automatically started by runtime.

Can you elaborate on what piece you are referring to here please?

@sanych-sun
Copy link
Member

I meant the code under ScheduleMetricScraping method.

Anyway, could you please double-check if my explanation is right, so we can close the issue on our side? And have a nice holiday =)

@sanych-sun sanych-sun added question and removed bug labels Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants