Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

Merged
merged 4 commits into from
Nov 19, 2024

Conversation

dhiaayachi
Copy link
Collaborator

@dhiaayachi dhiaayachi commented Oct 23, 2024

Description

This PR unify the way we cancel PeerUpstreamEndpoints and UpstreamPeerTrustBundles and add more safeguards to only cancel those watches when no other data needs them.

This is important because those watches could be shared by different upstreams in the scenario where multiple upstreams have the same target for PeerUpstreamEndpoints and in the case where multiple targets belong to the same Peer for UpstreamPeerTrustBundles. To avoid canceling a watch for those data sources while another upstream need them, we loop through all the upstreams and determine which watch is still needed and only cancel those which are not.

Testing & Reproduction steps

Links

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

}

reconcilePeeringWatches(snap.DiscoveryChain,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer if it were moved out of resetWatchesFromChain to be called serially afterwards, since it isn't chain-specific.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a dependency because we init the watch inside this function and that needs to be done after calling the reconciliation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering extracting the init watch as well and I think in that case I can call the reconciliation outside the disco chain specific func.

@dhiaayachi dhiaayachi added backport/all Apply backports for all active releases per .release/versions.hcl pr/no-changelog PR does not need a corresponding .changelog entry labels Oct 24, 2024
@nathancoleman nathancoleman self-requested a review November 5, 2024 18:21
@dhiaayachi dhiaayachi force-pushed the peering-watch-cancel-fix branch from 7579761 to e4068be Compare November 5, 2024 18:44
Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions

agent/proxycfg/state.go Outdated Show resolved Hide resolved
Comment on lines -304 to -309

targetUID := NewUpstreamIDFromTargetID(targetID)
if targetUID.Peer != "" {
snap.PeerUpstreamEndpoints.CancelWatch(targetUID)
snap.UpstreamPeerTrustBundles.CancelWatch(targetUID.Peer)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reconcile handled above instead of in this specific case intentional?

I'm imagining we should have been doing this in all cases before but were not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!
This was canceling for a specific target and that is related to the bug. While in reality, because those watches could be shared by different targets, we would like to reconcile the list of watches every time to make sure that shared watches don't get canceled.

@@ -479,8 +479,8 @@ func (s *handlerUpstreams) watchUpstreamTarget(ctx context.Context, snap *Config
var entMeta acl.EnterpriseMeta
entMeta.Merge(opts.entMeta)

ctx, cancel := context.WithCancel(ctx)
err := s.dataSources.Health.Notify(ctx, &structs.ServiceSpecificRequest{
peerCtx, cancel := context.WithCancel(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several ctx renames but wouldn't expect an overwriting issue with the way they were since they're always assigned inside a new scope. What motivated this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a second bug, the context rewrite made it that we created a chain for context ctx1->ctx2->ctx3 while we wanted ctx1->ctx2 ctx1->ctx3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh so the innermost one actually needs to be a child of the outermost one and not its enclosing loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need both watches data to be able to generate the right config. If we go with the chain then every time the one in the middle is canceled we loose the data of the innermost one because it get canceled with it.

if err := s.dataSources.TrustBundle.Notify(peerCtx, &cachetype.TrustBundleReadRequest{

if !snap.UpstreamPeerTrustBundles.IsWatched(uid.Peer) {
peerCtx2, cancel2 := context.WithCancel(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question above references this

@dhiaayachi dhiaayachi removed the pr/no-changelog PR does not need a corresponding .changelog entry label Nov 7, 2024
.changelog/21871.txt Outdated Show resolved Hide resolved
Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this @dhiaayachi !

@dhiaayachi dhiaayachi merged commit 21cca2d into main Nov 19, 2024
94 checks passed
@dhiaayachi dhiaayachi deleted the peering-watch-cancel-fix branch November 19, 2024 14:36
@hc-github-team-consul-core hc-github-team-consul-core added backport/1.20 Changes are backported to 1.20 backport/ent/1.15 Changes are backported to 1.15 ent backport/ent/1.18 Changes are backported to 1.18 ent backport/ent/1.19 Changes are backported to 1.19 ent labels Nov 19, 2024
@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

1 similar comment
@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

2 similar comments
@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

@hc-github-team-consul-core
Copy link
Collaborator

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/all Apply backports for all active releases per .release/versions.hcl backport/ent/1.15 Changes are backported to 1.15 ent backport/ent/1.18 Changes are backported to 1.18 ent backport/ent/1.19 Changes are backported to 1.19 ent backport/1.20 Changes are backported to 1.20
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants