-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only query active ingesters #5342
Conversation
fd455d6
to
969ca8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I suggest to add the faillint -paths=github.com/grafana/dskit/ring.{Read}
check in the Makefile (lint
target) to ensure we don't use ring.Read
anymore. It will find few other places where it's used, and here is my suggestion:
Distributor.AllUserStats()
: I think we can just usereadNoExtend
here.- Querier tests: I think we should use
storegateway.BlocksRead
instead, because that's the ring used there
I'm going to merge this PR and create a separate PR for the linting and subsequent changes you've suggested @pracucci |
What this PR does
This PR modifies the behaviour of queriers to only query ingesters in the
ACTIVE
state in the ring.Previously, we'd query ingesters in the
ACTIVE
,PENDING
orLEAVING
states. We believe this can lead to increased query latency during ingester restarts when combined with request minimisation (#5202): during ingester restarts (eg. due to a rollout), we'll try to query ingesters that are in the process of restarting, and then wait for them to respond, even though it's likely they won't respond. After timing out, we'll then initiate requests to another zone in order to reach quorum.When we query only
ACTIVE
ingesters, it's still possible that a querier will initiate a request to an ingester that is restarting if the ingester's state is out of date in the querier's view of the ring, but this should only be the case for a short period of time. This can be mitigated further by reducing the ingester client connection timeout.Which issue(s) this PR fixes or relates to
(none)
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]