Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit for concurrent tail requests #1562

Merged

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented Jan 22, 2020

What this PR does / why we need it:
Limit number of concurrent live tailing requests per tenant.
When a new request comes in, querier would ask one of the active ingesters for the count of active tail requests for the user. If the count exceeds the configured limits for the user then the tail request would be rejected right away.

NOTE:
Since we are adding a new RPC call exposed by ingester, we might see an error during rollouts when new queriers try to make that call with one of the old ingesters.

Which issue(s) this PR fixes:
Fixes #1551

Checklist

  • Documentation added
  • Tests updated

When a new request comes in, querier would ask one of the active ingesters for count of active tail requests for the user
If the count exceeds the configured limits for the user then tail request would be rejected right away.
@pracucci pracucci removed their request for review January 22, 2020 12:54
@pracucci
Copy link
Contributor

@sandlis I'm currently too busy on the Cortex side to accept Loki reviews in my backlog. For this reason I've removed myself from the reviewers list. Sorry for that!

@sandeepsukhani
Copy link
Contributor Author

@sandlis I'm currently too busy on the Cortex side to accept Loki reviews in my backlog. For this reason I've removed myself from the reviewers list. Sorry for that!

@pracucci No problem, I can understand.

@cyriltovena cyriltovena self-assigned this Jan 23, 2020
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// we want to check count of active tailers with only one of the active ingesters since
// all of them would be having same number of active tail connections
// Note: In worst-case scenario the ingester that we picked would have joined recently which might not have all the tail requests
Copy link
Member

@owen-d owen-d Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check all ingesters to avoid this error case. It doesn't seem like a relatively more expensive option (we're already creating tailers on every ingester for tailing requests). It does however keep us from loading additional queriers with tailers when the only ingester checked is new/under-burdened.

While this seems like it'd work itself out via the law of large numbers, consider a client retrying bad tail requests. This would eventually succeed if ingesters are rolling over as it'd keep checking until it queries a new ingester.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about only checking a single ingester's tailing count because if the client retries, if effectively makes the tailing limit only apply if every ingester is below said limit.

This may be something to consider on the ingesters instead/as well -- they can each enforce their own internal limit.

@slim-bean
Copy link
Collaborator

@owen-d tailing currently works by the querier opening a tail to every ingester, so it's generally safe to ask just one ingester for a tailing count.

There are some races here as ingesters are added/removed from the cluster say during a rollout however I generally think that window is small and the impact of allowing someone to open an additional websocket or two is minimal.

@slim-bean
Copy link
Collaborator

Both @owen-d and @cyriltovena seem to express similar concern here though, the code could fairly easily be changed to just ask all ingesters and take the largest count? Would you both prefer that?

It feels a bit unnecessary to me but I'm not that strongly opinionated here if you would both prefer we check all ingesters.

Signed-off-by: Edward Welch <edward.welch@grafana.com>
Signed-off-by: Edward Welch <edward.welch@grafana.com>
@cyriltovena cyriltovena merged commit c7a3ec5 into grafana:master Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Per-tenant max websocket connections.
5 participants