-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backoff retries in the activator. #1814
Backoff retries in the activator. #1814
Conversation
/test pull-knative-serving-integration-tests |
1 similar comment
/test pull-knative-serving-integration-tests |
/assign @josephburnett |
cmd/activator/main.go
Outdated
@@ -68,6 +68,10 @@ type retryRoundTripper struct { | |||
start time.Time | |||
} | |||
|
|||
func (rrt *retryRoundTripper) CalculateDelay(retries int, minRetryInterval time.Duration) time.Duration { | |||
return time.Duration(int(minRetryInterval/time.Millisecond)*retries*retries) * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is quadratic, not exponential. What we want is an aggressive retry during normal activation times, but a quickly growing retry interval thereafter. Which is easier to achieve with exponential because of that hockey stick shape.
In my experience a small base like 1.3 is a good start. With the retry index as the exponent. Then multiply by the min retry.
E.g. return time.Duration(int(minRetryInterval/time.Millisecond)*(1.3^retries)) * time.Millisecond
It would look something like this. (The actual numbers should be tuned, but the point is to keep the curve low and fast until we leave normal operating conditions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. Of course it's quadratic... very much my bad. Thanks for pointing that out, I'll fix accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2c4c6b0
to
65be77e
Compare
65be77e
to
52d5083
Compare
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: josephburnett, markusthoemmes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
2 similar comments
/retest |
/retest |
/restest |
/retest |
1 similar comment
/retest |
Fixes #1229
Proposed Changes
Added an exponential backoff to the activator's retry logic. In the process, I lowered the timeout to start with (we might need to adjust that a bit to hit a sweet spot) and the total time to retry is now bounded by the elapsed time spent in retrying + requesting.
To determine a good retry interval, the following table can help. Production data on how many retries were needed in reality will help to adjust though.
Regarding tests: Didn't find any for this specific file. I'd love to add some but will need some guidance on how to do so if necessary.
Release Note