Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exponential backoff on xds client retry #465

Merged
merged 2 commits into from
Jan 12, 2022

Conversation

markmandel
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix

What this PR does / Why we need it:

Solves the bug where on failure to connect, the retry operation would occur every 500ms.

  • Moved the ExponentialBackoff outside the backoff loop, so it didn't get recreated on each retry.
  • Reset the backoff back to initial state on the first retry.
  • Max the delay at 30s.
  • Add jitter of 0-2s to each delay.

Which issue(s) this PR fixes:

Closes #461

Special notes for your reviewer:

Sample log output (slog, edited for clarity), with delay written inline:

{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:25.094550425-08:00","delay":"1.782s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:26.880956106-08:00","delay":"2.903s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:29.786490322-08:00","delay":"3.172s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:32.962315902-08:00","delay":"4.333s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:37.299096949-08:00","delay":"8.757s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:46.060135438-08:00","delay":"17.473s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:34:03.535624597-08:00","delay":"30.706s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:34:34.243188450-08:00","delay":"30.808s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:35:05.055083825-08:00","delay":"30.966s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:35:36.025452758-08:00","delay":"31.897s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:36:07.925738599-08:00","delay":"30.413s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:36:38.343017008-08:00","delay":"30.799s","error":"transport error"}

Solves the bug where on failure to connect, the retry operation would
occur every 500ms.

* Moved the ExponentialBackoff outside the backoff loop, so it didn't
 get recreated on each retry.
* Reset the backoff back to initial state on the first retry.
* Max the delay at 30s.
* Add jitter of 0-2s to each delay.

Closes googleforgames#461

Sample log output (slog, edited for clarity), with delay written inline:

{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:25.094550425-08:00","delay":"1.782s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:26.880956106-08:00","delay":"2.903s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:29.786490322-08:00","delay":"3.172s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:32.962315902-08:00","delay":"4.333s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:37.299096949-08:00","delay":"8.757s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:33:46.060135438-08:00","delay":"17.473s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:34:03.535624597-08:00","delay":"30.706s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:34:34.243188450-08:00","delay":"30.808s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:35:05.055083825-08:00","delay":"30.966s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:35:36.025452758-08:00","delay":"31.897s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:36:07.925738599-08:00","delay":"30.413s","error":"transport error"}
{"msg":"Unable to connect to the XDS server","level":"ERRO","ts":"2022-01-11T12:36:38.343017008-08:00","delay":"30.799s","error":"transport error"}
@markmandel markmandel added kind/bug Something isn't working area/user-experience Pertaining to developers trying to use Quilkin, e.g. cli interface, configuration, etc labels Jan 11, 2022
@quilkin-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: f189c1ff-7889-4db6-b1dc-dd8e9d1cce10

To build this version:

git fetch git@github.com:googleforgames/quilkin.git pull/465/head:pr_465 && git checkout pr_465
cargo build

@XAMPPRocky XAMPPRocky merged commit e1c2617 into googleforgames:main Jan 12, 2022
@markmandel markmandel deleted the bug/xds-retry branch January 12, 2022 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Quilkin, e.g. cli interface, configuration, etc kind/bug Something isn't working size/s
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XDS backoff is broken
4 participants