Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use endpoints discovery to find all available control plane nodes #273

Open
wants to merge 1 commit into
base: aws-cwa-dev
Choose a base branch
from

Conversation

movence
Copy link

@movence movence commented Feb 3, 2025

Description:
Currently, ContainerInsights uses a static target discovery mechanism when initializing Prometheus scraping configuration for the control plane. This approach leverages a cluster IP for the k8s service that points to the control plane nodes, causing traffic to be load balanced to only one available control plane node. As a result, the agent only fetches data from a single node, leading to incomplete metrics collection.

This PR addresses the issue by updating the Prometheus configuration for the control plane to use endpoint discovery instead. This native Prometheus mechanism queries endpoints (e.g., kubernetes endpoint in the default namespace for the control plane) and creates targets for each IP associated with the endpoint (per port). Consequently, the agent will scrape control plane metrics from all nodes. Additionally, in the event of a control plane scale-up, endpoint discovery will automatically detect newly added nodes and begin scraping metrics from them as new targets.

Changes:

  • Change staticConfig to k8s endpoints discovery in scraping configuration for CP
  • Add default labels including ClusterName Sources NodeName Type and Version

Testing:
Attached screenshot demonstrating that the sum of the apiserver_request_total metrics scraped by the agent now matches the aggregated values reported by EKS vended metrics. This confirms that our new endpoint discovery mechanism is correctly capturing data from all control plane nodes.

Screenshot 2025-02-03 at 10 07 12 AM

Documentation:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant