-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limit for concurrent tail requests #1562
limit for concurrent tail requests #1562
Conversation
When a new request comes in, querier would ask one of the active ingesters for count of active tail requests for the user If the count exceeds the configured limits for the user then tail request would be rejected right away.
@sandlis I'm currently too busy on the Cortex side to accept Loki reviews in my backlog. For this reason I've removed myself from the reviewers list. Sorry for that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/querier/querier.go
Outdated
|
||
// we want to check count of active tailers with only one of the active ingesters since | ||
// all of them would be having same number of active tail connections | ||
// Note: In worst-case scenario the ingester that we picked would have joined recently which might not have all the tail requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check all ingesters to avoid this error case. It doesn't seem like a relatively more expensive option (we're already creating tailers on every ingester for tailing requests). It does however keep us from loading additional queriers with tailers when the only ingester checked is new/under-burdened.
While this seems like it'd work itself out via the law of large numbers, consider a client retrying bad tail requests. This would eventually succeed if ingesters are rolling over as it'd keep checking until it queries a new ingester.
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about only checking a single ingester's tailing count because if the client retries, if effectively makes the tailing limit only apply if every ingester is below said limit.
This may be something to consider on the ingesters instead/as well -- they can each enforce their own internal limit.
@owen-d tailing currently works by the querier opening a tail to every ingester, so it's generally safe to ask just one ingester for a tailing count. There are some races here as ingesters are added/removed from the cluster say during a rollout however I generally think that window is small and the impact of allowing someone to open an additional websocket or two is minimal. |
Both @owen-d and @cyriltovena seem to express similar concern here though, the code could fairly easily be changed to just ask all ingesters and take the largest count? Would you both prefer that? It feels a bit unnecessary to me but I'm not that strongly opinionated here if you would both prefer we check all ingesters. |
Signed-off-by: Edward Welch <edward.welch@grafana.com>
Signed-off-by: Edward Welch <edward.welch@grafana.com>
What this PR does / why we need it:
Limit number of concurrent live tailing requests per tenant.
When a new request comes in, querier would ask one of the active ingesters for the count of active tail requests for the user. If the count exceeds the configured limits for the user then the tail request would be rejected right away.
NOTE:
Since we are adding a new RPC call exposed by ingester, we might see an error during rollouts when new queriers try to make that call with one of the old ingesters.
Which issue(s) this PR fixes:
Fixes #1551
Checklist