Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rejected Advertisement Metric #1885

Merged
merged 4 commits into from
Jan 18, 2025

Conversation

patrickbrophy
Copy link
Collaborator

This PR addresses issue #1774. This metric is a simple counter that tracks the number of times the director has rejected an advertisement from an origin or cache.

@patrickbrophy patrickbrophy added enhancement New feature or request director Issue relating to the director component labels Jan 13, 2025
@patrickbrophy patrickbrophy added this to the v7.13.0 milestone Jan 13, 2025
@patrickbrophy patrickbrophy linked an issue Jan 13, 2025 that may be closed by this pull request
Copy link
Member

@jhiemstrawisc jhiemstrawisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about this as is. Assume we set up an origin that, for some reason, hasn't yet been approved by the federation administrators. The origin is hardcoded to advertise with the Director once every minute, which will cause this metric to start ticking up pretty fast, but with no way of telling:
a) what server is having advertisement issues
b) whether that server eventually succeeded but another server started failing at the same time

Is there a way you can restructure the metric such that we'd have an easy way to answer "Which servers, identified by their hostname, produced the most advertisement failures in the last week?" The challenge here would be finding a way to treat the hostnames safely, since those are an arbitrary part of some JSON we get.

Thoughts

@patrickbrophy
Copy link
Collaborator Author

For part a, I think that the server name could be a label for this metric. This can be sourced from the advertisement. For part b, I think this could be achievable through a query. For the latter part of your comment:

Is there a way you can restructure the metric such that we'd have an easy way to answer "Which servers, identified by their hostname, produced the most advertisement failures in the last week?"

I think you could achieve this via a query like the following:
topk(10, sum_over_time(pelican_director_rejected_advertisements_total[1w]) by (hostname))

@jhiemstrawisc jhiemstrawisc merged commit fc194ec into PelicanPlatform:main Jan 18, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
director Issue relating to the director component enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric For Rejected Advertisements
2 participants