Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh ahead cache fails after several refreshes #357

Closed
enocom opened this issue Jul 25, 2024 · 0 comments · Fixed by #355
Closed

Refresh ahead cache fails after several refreshes #357

enocom opened this issue Jul 25, 2024 · 0 comments · Fixed by #355
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@enocom
Copy link
Member

enocom commented Jul 25, 2024

We use a default parameter value in the function _seconds_until_refresh which determines when to refresh the client certificate. In normal operation, we expect the refresh operation to complete in the background ~4 minutes before the current certificate expires. However, we have discovered:

Default parameter values are evaluated from left to right when the function definition is executed. (source)

In effect, this means for any client using the default refresh ahead strategy (and not the "lazy-refresh" strategy which is unaffected here), the refresh cycle will fail to refresh the client cert prior to its expiration after the refresh operation has run for a few cycles. This also means this statement will always be false.

Observe:

Startup

now = 00:00 (this should change but doesn't because it's evaluated when the function definition is executed)
retrieve initial certs and start refresh ahead operation loop

Refresh Ahead Operation 0

current time = 00:00
current cert expiration = 01:00
cached cert has expired = no
time till expiration = 60 minutes
refresh = 30 minutes

Refresh Ahead Operation 1

current time = 00:30
current cert expiration = 01:30
cached cert has expired = no
time till expiration = 01:30 - 00:00 = 1.5 hours = 90 minutes
refresh = 45 minutes

Refresh Ahead Operation 2

current time = 01:15
current cert expiration = 02:15
cached cert has expired = no
time till expiration = 02:15 - 00:00 = 2.25 hours = 135 minutes
refresh = 68 minutes

Refresh Ahead Operation 3

current time = 02:23
current cert expiration = 02:15
cached cert has expired = yes!

By refresh ahead operation 3, the existing cached cert will have expired causing any new connections between 02:15-02:23 to fail. Connection pools may not immediately try to recreate connections during this "bad" period, but as time passes, the chances of creating a new connection when the client cert is invalid goes up.

How to fix this

Stop using a time value as a default argument and either always pass "now" in, or retrieve it within the function.

@enocom enocom added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jul 25, 2024
@enocom enocom closed this as completed Jul 25, 2024
@enocom enocom changed the title Refresh ahead cache fails after Refresh ahead cache fails after several refreshes Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant