Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

dhiaayachi · 2024-10-23T19:12:07Z

Description

This PR unify the way we cancel PeerUpstreamEndpoints and UpstreamPeerTrustBundles and add more safeguards to only cancel those watches when no other data needs them.

This is important because those watches could be shared by different upstreams in the scenario where multiple upstreams have the same target for PeerUpstreamEndpoints and in the case where multiple targets belong to the same Peer for UpstreamPeerTrustBundles. To avoid canceling a watch for those data sources while another upstream need them, we loop through all the upstreams and determine which watch is still needed and only cancel those which are not.

Testing & Reproduction steps

Links

PR Checklist

updated test coverage
external facing docs updated
appropriate backport labels added
not a security concern

rboyer · 2024-10-23T19:29:37Z

agent/proxycfg/upstreams.go

 	}
-
+	reconcilePeeringWatches(snap.DiscoveryChain,


I think this would be clearer if it were moved out of resetWatchesFromChain to be called serially afterwards, since it isn't chain-specific.

There is a dependency because we init the watch inside this function and that needs to be done after calling the reconciliation.

I'm considering extracting the init watch as well and I think in that case I can call the reconciliation outside the disco chain specific func.

nathancoleman

Just a couple of questions

agent/proxycfg/state.go

nathancoleman · 2024-11-05T18:42:10Z

agent/proxycfg/upstreams.go

-
-		targetUID := NewUpstreamIDFromTargetID(targetID)
-		if targetUID.Peer != "" {
-			snap.PeerUpstreamEndpoints.CancelWatch(targetUID)
-			snap.UpstreamPeerTrustBundles.CancelWatch(targetUID.Peer)
-		}


Is the reconcile handled above instead of in this specific case intentional?

I'm imagining we should have been doing this in all cases before but were not?

Good question!
This was canceling for a specific target and that is related to the bug. While in reality, because those watches could be shared by different targets, we would like to reconcile the list of watches every time to make sure that shared watches don't get canceled.

nathancoleman · 2024-11-05T18:46:49Z

agent/proxycfg/upstreams.go

@@ -479,8 +479,8 @@ func (s *handlerUpstreams) watchUpstreamTarget(ctx context.Context, snap *Config
 	var entMeta acl.EnterpriseMeta
 	entMeta.Merge(opts.entMeta)

-	ctx, cancel := context.WithCancel(ctx)
-	err := s.dataSources.Health.Notify(ctx, &structs.ServiceSpecificRequest{
+	peerCtx, cancel := context.WithCancel(ctx)


I see several ctx renames but wouldn't expect an overwriting issue with the way they were since they're always assigned inside a new scope. What motivated this change?

This is actually a second bug, the context rewrite made it that we created a chain for context ctx1->ctx2->ctx3 while we wanted ctx1->ctx2 ctx1->ctx3

ahhh so the innermost one actually needs to be a child of the outermost one and not its enclosing loop?

yes, we need both watches data to be able to generate the right config. If we go with the chain then every time the one in the middle is canceled we loose the data of the innermost one because it get canceled with it.

nathancoleman · 2024-11-05T18:47:21Z

agent/proxycfg/upstreams.go

-		if err := s.dataSources.TrustBundle.Notify(peerCtx, &cachetype.TrustBundleReadRequest{
+
+	if !snap.UpstreamPeerTrustBundles.IsWatched(uid.Peer) {
+		peerCtx2, cancel2 := context.WithCancel(ctx)


Question above references this

.changelog/21871.txt

Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>

nathancoleman

LGTM, thanks for fixing this @dhiaayachi !

hc-github-team-consul-core · 2024-11-20T09:07:55Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

hc-github-team-consul-core · 2024-11-21T09:07:56Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

hc-github-team-consul-core · 2024-11-22T09:09:36Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15,1.18,1.19] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

hc-github-team-consul-core · 2024-11-23T09:07:50Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

hc-github-team-consul-core · 2024-11-24T09:08:11Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

hc-github-team-consul-core · 2024-11-25T09:09:45Z

📣 Hi @dhiaayachi! a backport is missing for this PR [21871] for versions [1.15] please perform the backport manually and add the following snippet to your backport PR description:

<details>
	<summary> Overview of commits </summary>
		- <<backport commit 1>>
		- <<backport commit 2>>
		...
</details>

rboyer reviewed Oct 23, 2024

View reviewed changes

dhiaayachi added backport/all Apply backports for all active releases per .release/versions.hcl pr/no-changelog PR does not need a corresponding .changelog entry labels Oct 24, 2024

rboyer approved these changes Oct 31, 2024

View reviewed changes

nathancoleman self-requested a review November 5, 2024 18:21

fix to only reset peering watches when no other target need watching

e4068be

dhiaayachi force-pushed the peering-watch-cancel-fix branch from 7579761 to e4068be Compare November 5, 2024 18:44

nathancoleman reviewed Nov 5, 2024

View reviewed changes

dhiaayachi added 2 commits November 5, 2024 14:04

remove unused logger

6e3c944

add changelog

48b1103

dhiaayachi removed the pr/no-changelog PR does not need a corresponding .changelog entry label Nov 7, 2024

dhiaayachi requested review from nathancoleman and ndhanushkodi November 7, 2024 14:12

ndhanushkodi approved these changes Nov 18, 2024

View reviewed changes

.changelog/21871.txt Outdated Show resolved Hide resolved

Update .changelog/21871.txt

ba9155b

Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>

nathancoleman approved these changes Nov 18, 2024

View reviewed changes

dhiaayachi merged commit 21cca2d into main Nov 19, 2024
94 checks passed

dhiaayachi deleted the peering-watch-cancel-fix branch November 19, 2024 14:36

hc-github-team-consul-core added backport/1.20 Changes are backported to 1.20 backport/ent/1.15 Changes are backported to 1.15 ent backport/ent/1.18 Changes are backported to 1.18 ent backport/ent/1.19 Changes are backported to 1.19 ent labels Nov 19, 2024

hc-github-team-consul-core mentioned this pull request Nov 19, 2024

Backport of Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active into release/1.20.x #21956

Merged

4 tasks

dhiaayachi mentioned this pull request Nov 21, 2024

fix to not cancel Trust Bundle watch when another upstream is available #21867

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

dhiaayachi commented Oct 23, 2024 •

edited by nathancoleman

Loading

rboyer Oct 23, 2024

dhiaayachi Oct 23, 2024

dhiaayachi Oct 23, 2024

nathancoleman left a comment

nathancoleman Nov 5, 2024

dhiaayachi Nov 5, 2024

nathancoleman Nov 5, 2024

dhiaayachi Nov 5, 2024

nathancoleman Nov 5, 2024

dhiaayachi Nov 5, 2024

nathancoleman Nov 5, 2024

nathancoleman left a comment

hc-github-team-consul-core commented Nov 20, 2024

hc-github-team-consul-core commented Nov 21, 2024

hc-github-team-consul-core commented Nov 22, 2024

hc-github-team-consul-core commented Nov 23, 2024

hc-github-team-consul-core commented Nov 24, 2024

hc-github-team-consul-core commented Nov 25, 2024

Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

Fix PeerUpstreamEndpoints and UpstreamPeerTrustBundles to only Cancel watch when needed, otherwise keep the watch active #21871

Conversation

dhiaayachi commented Oct 23, 2024 • edited by nathancoleman Loading

Description

Testing & Reproduction steps

Links

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathancoleman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathancoleman left a comment

Choose a reason for hiding this comment

hc-github-team-consul-core commented Nov 20, 2024

hc-github-team-consul-core commented Nov 21, 2024

hc-github-team-consul-core commented Nov 22, 2024

hc-github-team-consul-core commented Nov 23, 2024

hc-github-team-consul-core commented Nov 24, 2024

hc-github-team-consul-core commented Nov 25, 2024

dhiaayachi commented Oct 23, 2024 •

edited by nathancoleman

Loading