-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vault): let shdict secret vault cache presist enough time during resurrect_ttl #13471
Conversation
74ff0b0
to
61a3e20
Compare
LGTM |
changelog/unreleased/kong/fix-vault-resurrect-ttl-multi-worker.yml
Outdated
Show resolved
Hide resolved
kong/pdk/vault.lua
Outdated
if ttl > resurrect_ttl then | ||
return true | ||
|
||
elseif ttl > SECRETS_CACHE_MIN_TTL then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm understanding this correctly, but why should ttl > SECRETS_CACHE_MIN_TTL
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the vault timer is refreshing all the secrets in shared dict with a period that equals to ROTATION_INTERVAL
, and SECRETS_CACHE_MIN_TTL
is 2 * ROTATION_INTERVAL
. This ensures the execution of the secret refreshing will happen at least once within the last remaining time of SECRETS_CACHE_MIN_TTL
, so that it can be refreshed either with a null placholder(NEGATIVELY_CACHED_VALUE
) or a new updated value.
changelog/unreleased/kong/fix-vault-resurrect-ttl-multi-worker.yml
Outdated
Show resolved
Hide resolved
I will take a look on it. |
Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com>
@bungle I applied the reviews above, do you want to take another look? It seems something is wrong with Github Action's worker, all tests are pending, we may need to wait until all tests passed |
if resurrect and value == nil then | ||
local resurrected_value = SECRETS_CACHE:get(cache_key) | ||
if resurrected_value then | ||
return resurrected_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a log here?
…resurrect_ttl (#13471) This PR fixes an issue that rotate_secret may flush a secret value with NEGATIVE_CACHED_VALUE when vault backend is down and a secret value stored in the shared dict has passed its ttl and hasn't finished consuming its resurrect_ttl. TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value The issue was first reported in FTI-6137. --------- Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> (cherry picked from commit 9269195)
Successfully created backport PR for |
Successfully created cherry-pick PR for |
…resurrect_ttl (#13471) This PR fixes an issue that rotate_secret may flush a secret value with NEGATIVE_CACHED_VALUE when vault backend is down and a secret value stored in the shared dict has passed its ttl and hasn't finished consuming its resurrect_ttl. TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value The issue was first reported in FTI-6137. --------- Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> (cherry picked from commit 9269195)
Cherry-pick failed for Please cherry-pick the changes locally. git remote add upstream https://github.com/kong/kong-ee
git fetch upstream master
git worktree add -d .worktree/cherry-pick-13471-to-master-to-upstream upstream/master
cd .worktree/cherry-pick-13471-to-master-to-upstream
git checkout -b cherry-pick-13471-to-master-to-upstream
ancref=$(git merge-base 9bc3deb30041dbc45b1fe44fe8069f21ab69caaa 30215e6ff8c7711162107c53a7252f9d78691ca6)
git cherry-pick -x $ancref..30215e6ff8c7711162107c53a7252f9d78691ca6 |
…resurrect_ttl (#13471) This PR fixes an issue that rotate_secret may flush a secret value with NEGATIVE_CACHED_VALUE when vault backend is down and a secret value stored in the shared dict has passed its ttl and hasn't finished consuming its resurrect_ttl. TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value The issue was first reported in FTI-6137. --------- Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> (cherry picked from commit 9269195)
Successfully created backport PR for |
…resurrect_ttl (#13471) This PR fixes an issue that rotate_secret may flush a secret value with NEGATIVE_CACHED_VALUE when vault backend is down and a secret value stored in the shared dict has passed its ttl and hasn't finished consuming its resurrect_ttl. TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value The issue was first reported in FTI-6137. --------- Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> (cherry picked from commit 9269195)
Successfully created backport PR for |
…resurrect_ttl (#13471) This PR fixes an issue that rotate_secret may flush a secret value with NEGATIVE_CACHED_VALUE when vault backend is down and a secret value stored in the shared dict has passed its ttl and hasn't finished consuming its resurrect_ttl. TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value The issue was first reported in FTI-6137. --------- Signed-off-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> (cherry picked from commit 9269195)
Summary
This PR fixes an issue that
rotate_secret
may flush a secret value withNEGATIVE_CACHED_VALUE
when vault backend is down and a secret value stored in the shared dict has passed itsttl
and hasn't finished consuming itsresurrect_ttl
.TLDR; this issue happens easily when a reference is being used via the vault PDK function in custom codes(serverless functions, custom plugins, etc.), and some of the worker processes may not be triggered via the service/routes that use these custom codes, and these worker processes do not hold a valid LRU cache for the secret value
The issue was first reported in FTI-6137.
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.mdIssue reference
FTI-6137