Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding circuit breakers on ingester server side for read path #8285

Merged
merged 9 commits into from
Jun 6, 2024

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Jun 5, 2024

What this PR does

In #8180 we added circuit breakers on ingester server side for write path. This PR extends the feature by implementing the read path counter part.

Circuit breakers on ingester read path behave like this: before any of the read path-related methods (ingester.QueryStream(), ingestser.QueryExemplars(), ingester.LabelValues(), ingester.LabelNames(), ingester.UserStats(), ingester.AllUserStats(), ingester.MetricsForLabelMatchers(), ingester.MetricsMetadata(), ingester.LabelNamesAndValues(), ingester.LabelValuesCardinality(), ingester.ActiveSeries()) is executed, we try to acquire a circuit breaker permit. If a permit is not acquired due to an open circuit breaker, the request is rejected. If a permit is acquired, a read request is started. Once the read request is completed, the result is recorded with the circuit breaker: if the read request ended with a circuit breaker relevant error, a failure is recorded with the circuit breaker. Otherwise, a success is recorded. Errors which are relevant for ingester server side circuit breakers are per-instance limits and timeouts. Timeouts on the read path can occur if a read request lasts longer than the configured maximal duration of read requests.

Circuit breakers can be activated by setting -ingester.circuit-breaker.enabled to true. Maximal duration of read requests can be configured as -ingester.circuit-breaker.read-timeout.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@duricanikolic duricanikolic force-pushed the yuri/ingester-server-cb branch 2 times, most recently from d1c5604 to ea700ee Compare June 5, 2024 16:00
@duricanikolic duricanikolic self-assigned this Jun 5, 2024
@duricanikolic duricanikolic marked this pull request as ready for review June 5, 2024 16:01
@duricanikolic duricanikolic requested review from jdbaldry and a team as code owners June 5, 2024 16:01
@colega
Copy link
Contributor

colega commented Jun 6, 2024

Can we use a middleware instead of having to explicitly call this in each method of the server? I see that push requests are limited through a call from grpcInflightMethodLimiter middleware. Can we extend that one to also handle all the read requests in a single place?

@pstibrany
Copy link
Member

Can we use a middleware instead of having to explicitly call this in each method of the server? I see that push requests are limited through a call from grpcInflightMethodLimiter middleware. Can we extend that one to also handle all the read requests in a single place?

grpcInflightMethodLimiter is not a grpc middleware, but tap handle, and it is used for a specific reason -- we can stop push requests before they are fully in memory. We don't have the same need on read path -- read path requests are tiny, so I would avoid using grpcInflightMethodLimiter for read path too.

grpc middleware may be a better place here.

pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@duricanikolic duricanikolic requested a review from colega June 6, 2024 10:40
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@duricanikolic duricanikolic requested a review from colega June 6, 2024 13:27
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
@duricanikolic duricanikolic merged commit 3d55dbe into main Jun 6, 2024
29 checks passed
@duricanikolic duricanikolic deleted the yuri/ingester-server-cb branch June 6, 2024 18:17
@pstibrany
Copy link
Member

From our internal Slack discussion:

Peter: If I understand it correctly … if read requests are taking longer than 30s, this can trigger opening of CB, and ingester will start rejecting push requests?

Oleg: Yeah, those should be two separate circuit breaker instances. The read one should try to acquire both read & write, but the write should only take the write one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants