Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out non-running pods in Prometheus #3073

Merged
merged 1 commit into from
Sep 6, 2022

Conversation

acondrat
Copy link
Contributor

@acondrat acondrat commented Sep 5, 2022

Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816

Signed-off-by: Arcadie Condrat arcadie.condrat@gmail.com

Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be  listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816

Signed-off-by: Arcadie Condrat <arcadie.condrat@gmail.com>
@acondrat
Copy link
Contributor Author

acondrat commented Sep 5, 2022

Same change for the Flux2 helm chart - fluxcd-community/helm-charts#121

@stefanprodan
Copy link
Member

@acondrat would this prevent users from being notified when a Flux controller is in crash loop?

@acondrat
Copy link
Contributor Author

acondrat commented Sep 5, 2022

A pod that is in a crashloop would still be captured by the up == 0 query. I suppose most(if not all users) have an alerting rule built around that. This is also something that is captured by the kube-state-metrics exporter with a KubeDeploymentReplicasMismatch alert.

@stefanprodan stefanprodan added the area/monitoring Monitoring related issues and pull requests label Sep 5, 2022
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @acondrat

@stefanprodan stefanprodan merged commit 73668d1 into fluxcd:main Sep 6, 2022
@SuperQ
Copy link

SuperQ commented Sep 16, 2022

You may also want to include Pending. I think in some cases, the Pod IP has been assigned, but not all containers have started.

Including Pending should also shorten the time between a Pod starting and Prometheus getting the configuration updated for the first scrape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants