-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3: Add retry for round robin load balancer based on grpc-middleware's retry interceptor #16
clientv3: Add retry for round robin load balancer based on grpc-middleware's retry interceptor #16
Conversation
clientv3/client.go
Outdated
@@ -461,6 +469,21 @@ func newClient(cfg *Config) (*Client, error) { | |||
return client, nil | |||
} | |||
|
|||
// roundRobinQuoremBackoff retries against quorem between each backoff. | |||
// This is indended for use with a round robin load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo s/indended/intended/
?
clientv3/client.go
Outdated
quorem := (n/2 + 1) | ||
if attempt%quorem == 0 { | ||
return backoffutils.JitterUp(waitBetween, jitterFraction) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need } else {
? Just return 0 next line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I keep forgetting this particular go convention.
clientv3/client.go
Outdated
// This is indended for use with a round robin load balancer. | ||
func (c *Client) roundRobinQuoremBackoff(waitBetween time.Duration, jitterFraction float64) grpc_retry.BackoffFunc { | ||
return func(attempt uint) time.Duration { | ||
// after each round robin across quorem, backoff for our wait between duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo s/quorem/quorum
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also s/roundRobinQuoremBackoff/roundRobinQuorumBackoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ Thanks!
clientv3/client.go
Outdated
@@ -461,6 +469,21 @@ func newClient(cfg *Config) (*Client, error) { | |||
return client, nil | |||
} | |||
|
|||
// roundRobinQuoremBackoff retries against quorem between each backoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/quorem/quorum
de74069
to
365286d
Compare
@gyuho Got two remaining issues (see description) on this and then it should be ready. I'm thinking of implementing auth retry in an interceptor. There are a couple ways I could do this. But I'm a bit unclear why this is needed. It looks like a band-aid for something. Retry of clientv3 functions that do more than just a remote call is messy. I don't see any way around keeping a subset of |
It was introduced to retry on expired tokens and to get assigned a new token, whether it's a simple or JWT token (ref. etcd-io#7110).
Hmm, I will also take a look what the middleware does today. Maybe just replace manual retries on transient error, and keep rest of the retry logic? As long as the interceptor can cover transient error handling, middleware should be worth. I agree that the current retry logic is quite messy. What subset of retry.go is hard to replace with the interceptor? If we can't replace now, I feel like we would still need manual retry logic even with https://github.com/grpc/proposal/blob/master/A6-client-retries.md#retry-policy? |
365286d
to
226d32a
Compare
Gotcha. Thanks for the ref.
The main gaps:
#1 will always be our responsibility but there isn't all that much of it, so I don't consider that a big problem. For now, adding back the manual retry logic does seem to be the safest path forward. So I've done that. The
|
226d32a
to
7c6dd55
Compare
Sounds good.
Yeah, let's keep it as TODO for now, since it's not available in gRPC upstream yet.
Yes. Feel free to refactor our codebase as needed! It's too tightly coupled with old balancer interface, so need to be reworked anyway. |
9c962cb
to
763b92a
Compare
@gyuho This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
While we ended up keeping old retry code, we can always refactor and remove once official gRPC retry policy is ready. Just having built-in interceptor for transient error and backoffs would still help us a lot.
763b92a
to
9256282
Compare
9256282
to
6ed45be
Compare
@gyuho This is ready for another pass. I've removed the dependency on |
+1
Sounds good. Let's do it in a separate PR. LGTM. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Per discussion with grpc team, this introduces an grpc interceptor based on
go-grpc-middleware/retry
for etcd client side retry.This retry uses grpc interceptors to retry "retriable" errors returned by the etcd server. We've also introduced a backoff routine that attempts a call to quorum count (2 of 3, 3 of 5, ...) of etcd servers between each backoff interval.
This does not yet entirely replace the
clientv3/retry.go
implementation, which is still required for "auth retry" and for retry of grpc streaming requests that utilize client streams, sincego-grpc-middleware/retry
does not support client streams, but it does supersede it for the bulk of intermittent call failures, and while adding this we've teased out the future work required to replaceclientv3/retry.go
entirely.Future work
clientv3/retry.go
once the above are complete..