limit for concurrent tail requests #1562

sandeepsukhani · 2020-01-22T12:52:07Z

What this PR does / why we need it:
Limit number of concurrent live tailing requests per tenant.
When a new request comes in, querier would ask one of the active ingesters for the count of active tail requests for the user. If the count exceeds the configured limits for the user then the tail request would be rejected right away.

NOTE:
Since we are adding a new RPC call exposed by ingester, we might see an error during rollouts when new queriers try to make that call with one of the old ingesters.

Which issue(s) this PR fixes:
Fixes #1551

Checklist

Documentation added
Tests updated

When a new request comes in, querier would ask one of the active ingesters for count of active tail requests for the user If the count exceeds the configured limits for the user then tail request would be rejected right away.

pracucci · 2020-01-22T12:55:13Z

@sandlis I'm currently too busy on the Cortex side to accept Loki reviews in my backlog. For this reason I've removed myself from the reviewers list. Sorry for that!

sandeepsukhani · 2020-01-22T12:59:25Z

@sandlis I'm currently too busy on the Cortex side to accept Loki reviews in my backlog. For this reason I've removed myself from the reviewers list. Sorry for that!

@pracucci No problem, I can understand.

pkg/querier/querier.go

cyriltovena

LGTM

owen-d · 2020-01-24T13:47:43Z

pkg/querier/querier.go

+
+	// we want to check count of active tailers with only one of the active ingesters since
+	// all of them would be having same number of active tail connections
+	// Note: In worst-case scenario the ingester that we picked would have joined recently which might not have all the tail requests


I think we should check all ingesters to avoid this error case. It doesn't seem like a relatively more expensive option (we're already creating tailers on every ingester for tailing requests). It does however keep us from loading additional queriers with tailers when the only ingester checked is new/under-burdened.

While this seems like it'd work itself out via the law of large numbers, consider a client retrying bad tail requests. This would eventually succeed if ingesters are rolling over as it'd keep checking until it queries a new ingester.

…tail-requests

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

owen-d

I'm worried about only checking a single ingester's tailing count because if the client retries, if effectively makes the tailing limit only apply if every ingester is below said limit.

This may be something to consider on the ingesters instead/as well -- they can each enforce their own internal limit.

slim-bean · 2020-01-24T14:15:06Z

@owen-d tailing currently works by the querier opening a tail to every ingester, so it's generally safe to ask just one ingester for a tailing count.

There are some races here as ingesters are added/removed from the cluster say during a rollout however I generally think that window is small and the impact of allowing someone to open an additional websocket or two is minimal.

slim-bean · 2020-01-24T14:17:30Z

Both @owen-d and @cyriltovena seem to express similar concern here though, the code could fairly easily be changed to just ask all ingesters and take the largest count? Would you both prefer that?

It feels a bit unnecessary to me but I'm not that strongly opinionated here if you would both prefer we check all ingesters.

Signed-off-by: Edward Welch <edward.welch@grafana.com>

limit for concurrent tail requests

3426bf0

When a new request comes in, querier would ask one of the active ingesters for count of active tail requests for the user If the count exceeds the configured limits for the user then tail request would be rejected right away.

sandeepsukhani requested review from cyriltovena and pracucci January 22, 2020 12:52

pull-request-size bot added the size/XL label Jan 22, 2020

pracucci removed their request for review January 22, 2020 12:54

cyriltovena self-assigned this Jan 23, 2020

cyriltovena reviewed Jan 24, 2020

View reviewed changes

pkg/querier/querier.go Show resolved Hide resolved

cyriltovena approved these changes Jan 24, 2020

View reviewed changes

owen-d reviewed Jan 24, 2020

View reviewed changes

cyriltovena added 2 commits January 24, 2020 08:53

Merge remote-tracking branch 'upstream/master' into limit-concurrent-…

1c3805b

…tail-requests

goimports querier_test.go

8eee900

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

owen-d requested changes Jan 24, 2020

View reviewed changes

slim-bean added 2 commits January 24, 2020 09:40

query all ingesters for their max tail count

1c7936b

Signed-off-by: Edward Welch <edward.welch@grafana.com>

changed to check only ACTIVE ingesters and tweaked error handling some

5342b51

Signed-off-by: Edward Welch <edward.welch@grafana.com>

owen-d approved these changes Jan 24, 2020

View reviewed changes

cyriltovena merged commit c7a3ec5 into grafana:master Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit for concurrent tail requests #1562

limit for concurrent tail requests #1562

sandeepsukhani commented Jan 22, 2020 •

edited

Loading

pracucci commented Jan 22, 2020

sandeepsukhani commented Jan 22, 2020

cyriltovena left a comment

owen-d Jan 24, 2020 •

edited

Loading

owen-d left a comment

slim-bean commented Jan 24, 2020

slim-bean commented Jan 24, 2020

limit for concurrent tail requests #1562

limit for concurrent tail requests #1562

Conversation

sandeepsukhani commented Jan 22, 2020 • edited Loading

pracucci commented Jan 22, 2020

sandeepsukhani commented Jan 22, 2020

cyriltovena left a comment

Choose a reason for hiding this comment

owen-d Jan 24, 2020 • edited Loading

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

slim-bean commented Jan 24, 2020

slim-bean commented Jan 24, 2020

sandeepsukhani commented Jan 22, 2020 •

edited

Loading

owen-d Jan 24, 2020 •

edited

Loading