-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871
Conversation
agent/proxycfg/upstreams.go
Outdated
} | ||
|
||
reconcilePeeringWatches(snap.DiscoveryChain, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer if it were moved out of resetWatchesFromChain
to be called serially afterwards, since it isn't chain-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a dependency because we init the watch inside this function and that needs to be done after calling the reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering extracting the init watch as well and I think in that case I can call the reconciliation outside the disco chain specific func.
7579761
to
e4068be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions
|
||
targetUID := NewUpstreamIDFromTargetID(targetID) | ||
if targetUID.Peer != "" { | ||
snap.PeerUpstreamEndpoints.CancelWatch(targetUID) | ||
snap.UpstreamPeerTrustBundles.CancelWatch(targetUID.Peer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reconcile handled above instead of in this specific case intentional?
I'm imagining we should have been doing this in all cases before but were not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
This was canceling for a specific target and that is related to the bug. While in reality, because those watches could be shared by different targets, we would like to reconcile the list of watches every time to make sure that shared watches don't get canceled.
@@ -479,8 +479,8 @@ func (s *handlerUpstreams) watchUpstreamTarget(ctx context.Context, snap *Config | |||
var entMeta acl.EnterpriseMeta | |||
entMeta.Merge(opts.entMeta) | |||
|
|||
ctx, cancel := context.WithCancel(ctx) | |||
err := s.dataSources.Health.Notify(ctx, &structs.ServiceSpecificRequest{ | |||
peerCtx, cancel := context.WithCancel(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see several ctx
renames but wouldn't expect an overwriting issue with the way they were since they're always assigned inside a new scope. What motivated this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a second bug, the context rewrite made it that we created a chain for context ctx1->ctx2->ctx3 while we wanted ctx1->ctx2 ctx1->ctx3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh so the innermost one actually needs to be a child of the outermost one and not its enclosing loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we need both watches data to be able to generate the right config. If we go with the chain then every time the one in the middle is canceled we loose the data of the innermost one because it get canceled with it.
if err := s.dataSources.TrustBundle.Notify(peerCtx, &cachetype.TrustBundleReadRequest{ | ||
|
||
if !snap.UpstreamPeerTrustBundles.IsWatched(uid.Peer) { | ||
peerCtx2, cancel2 := context.WithCancel(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question above references this
Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing this @dhiaayachi !
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:
|
1 similar comment
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:
|
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:
|
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:
|
2 similar comments
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:
|
📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:
|
Description
This PR unify the way we cancel
PeerUpstreamEndpoints
andUpstreamPeerTrustBundles
and add more safeguards to only cancel those watches when no other data needs them.This is important because those watches could be shared by different upstreams in the scenario where multiple upstreams have the same target for
PeerUpstreamEndpoints
and in the case where multiple targets belong to the same Peer forUpstreamPeerTrustBundles
. To avoid canceling a watch for those data sources while another upstream need them, we loop through all the upstreams and determine which watch is still needed and only cancel those which are not.Testing & Reproduction steps
Links
PR Checklist