Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(client): use another goroutine to send config to Konnect #6320

Closed
wants to merge 2 commits into from

Conversation

randmonkey
Copy link
Contributor

@randmonkey randmonkey commented Jul 16, 2024

What this PR does / why we need it:

Spawn a goroutine to send configuration to Konnect in KongClient.maybeSendOutToKonnectClient. This should make sending to Konnect not to block syncing configuration to Kong gateways.

Which issue this PR fixes:

fixes #6188

Special notes for your reviewer:

There are LOTS of potential concurrency issues because we have written some code and tests based on the current implementation that uploading to Konnect is run in serial after sending config to gateways and they are protected by the same lock. For example, the logic of config status should be changed and we need to use Eventually in test cases for checking if a certain client is updated.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

Copy link

codecov bot commented Jul 16, 2024

Codecov Report

Attention: Patch coverage is 23.33333% with 46 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@313234c). Learn more about missing BASE report.

Files Patch % Lines
internal/dataplane/kong_client_konnect.go 9.5% 38 Missing ⚠️
internal/clients/config_status.go 0.0% 5 Missing ⚠️
internal/adminapi/client.go 75.0% 2 Missing ⚠️
internal/dataplane/kong_client.go 80.0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             main   #6320   +/-   ##
======================================
  Coverage        ?   51.9%           
======================================
  Files           ?     199           
  Lines           ?   20057           
  Branches        ?       0           
======================================
  Hits            ?   10415           
  Misses          ?    8560           
  Partials        ?    1082           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@randmonkey randmonkey force-pushed the feat/sync_to_konnect_use_different_goroutine branch from 351397e to fea914a Compare July 16, 2024 09:52
@randmonkey randmonkey marked this pull request as ready for review July 16, 2024 10:02
@randmonkey randmonkey requested a review from a team as a code owner July 16, 2024 10:02
@randmonkey randmonkey force-pushed the feat/sync_to_konnect_use_different_goroutine branch 2 times, most recently from 44d40d4 to c890419 Compare July 16, 2024 10:22
@randmonkey randmonkey marked this pull request as draft July 16, 2024 10:23
@randmonkey randmonkey force-pushed the feat/sync_to_konnect_use_different_goroutine branch from c890419 to 17ef7f4 Compare July 17, 2024 03:39
@randmonkey randmonkey force-pushed the feat/sync_to_konnect_use_different_goroutine branch from 17ef7f4 to 69c6105 Compare July 17, 2024 03:48
@randmonkey randmonkey marked this pull request as ready for review July 17, 2024 05:49
Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re concurrency concerns, could you elaborate on problem scenarios you expect? I'm not entirely clear on the chain of events in

For example, the logic of config status should be changed and we need to use Eventually in test cases for checking if a certain client is updated.

Is there a scenario where we can hit deadlock or block gateway updates for an excessive amount of time, or is it something that can only screw up tests somehow? AFAIK this fixes that problem, so I'm pretty sure the bad case isn't the issue. AFAICT the remaining lock stuff is just "set this status var to some value" that should always promptly update and release the lock.


Idle 🤔 that's probably out of scope here (removing the current deadlock is a more immediate priority even if this would be cleaner).

Could we consider a channel-based Konnect syncer that runs in a separate manager Runnable? AFAIK we only really care about Konnect showing the latest config, correct? We could just have the main client forward its latest successful config off to that other Runnable that will watch for inbound configs, abort any retries when it receives a new one, and try syncing the current one independent of the gateway sync loop.

With that model we maybe don't need the shared locks at all. Brief review of existing internal/clients/config_status.go status reporting is that it's already channel-based, so it shouldn't be too hard to refactor it to for over inbound events that contain either gateway sync updates or Konnect sync updates, rather than the current "all statuses in one big blob" approach.

With that model there's only one client that has access to the config SHA, and the separate Runnable is just a recipient of it. Refactoring the client code to make it modular in that fashion is unfortunately non-trivial, but maybe worth considering given that we'd like to try and refactor KongClient in general anyway.

@@ -805,6 +740,12 @@ func (c *KongClient) sendToClient(
AppendStubEntityWhenConfigEmpty: !client.IsKonnect() && config.InMemory,
}
targetContent := deckgen.ToDeckContent(ctx, logger, s, deckGenParams)
// Remove consumers in target content when we send to Konnect if `--disable-consumers-sync` flag is set.
// REVIEW: add an option to `deckGenParams` to omit them in `ToDeckContent` to reduce the cost filling the contents?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine to leave as a "future perf improvement opportunity here" comment. Actually syncing consumers is a pretty big cost because it's network-bound and incurs egress costs, but the in-memory JSON fiddling with them probably isn't too huge a concern in the short term.

Were we better at OSS outreach it'd be a good candidate for "good first issue" territory.

// In case of an error, we only log it since we don't want the Konnect to affect the basic functionality
// of the controller.
if errors.As(err, &sendconfig.UpdateSkippedDueToBackoffStrategyError{}) {
c.logger.Info("Skipped pushing configuration to Konnect due to backoff strategy", "explanation", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anecdata from some field team support requests is that this log is a bit confusing, since users don't have an inherent mental model of why backoff happens and what it's doing. IDK how best to improve it at present. Kinda out of scope for this change anyway (this diff is just an existing log that got moved elsewhere), but it's a minor pain point to keep in mind.

@randmonkey
Copy link
Contributor Author

Re concurrency concerns, could you elaborate on problem scenarios you expect? I'm not entirely clear on the chain of events in

For example, the logic of config status should be changed and we need to use Eventually in test cases for checking if a certain client is updated.

Is there a scenario where we can hit deadlock or block gateway updates for an excessive amount of time, or is it something that can only screw up tests somehow? AFAIK this fixes that problem, so I'm pretty sure the bad case isn't the issue. AFAICT the remaining lock stuff is just "set this status var to some value" that should always promptly update and release the lock.

Idle 🤔 that's probably out of scope here (removing the current deadlock is a more immediate priority even if this would be cleaner).

Could we consider a channel-based Konnect syncer that runs in a separate manager Runnable? AFAIK we only really care about Konnect showing the latest config, correct? We could just have the main client forward its latest successful config off to that other Runnable that will watch for inbound configs, abort any retries when it receives a new one, and try syncing the current one independent of the gateway sync loop.

With that model we maybe don't need the shared locks at all. Brief review of existing internal/clients/config_status.go status reporting is that it's already channel-based, so it shouldn't be too hard to refactor it to for over inbound events that contain either gateway sync updates or Konnect sync updates, rather than the current "all statuses in one big blob" approach.

With that model there's only one client that has access to the config SHA, and the separate Runnable is just a recipient of it. Refactoring the client code to make it modular in that fashion is unfortunately non-trivial, but maybe worth considering given that we'd like to try and refactor KongClient in general anyway.

I have created issue #6325 to create another runnable to run a separate loop for sending config to Konnect. Another thought of this could be consolidating the new runnable with the existing Konnect NodeAgent.

@rainest rainest self-requested a review July 24, 2024 21:08
Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor shuffling around of when the goroutine spawns (unless there's some reason we need to keep that within the call--I don't think the client spawn has to happen in the parent goroutine) and clean up of the review comment and this should be good to go.

Comment on lines +31 to +35
// send Kong configuration to Konnect client in a new goroutine
go c.sendOutToKonnectClient(
ctx, konnectClient, s, config, isFallback,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think move the goroutine spawn out to the call within Update() with a comment mentioning the goroutine's lifecycle (AFAIK it can't hang around, so just note that it'll either terminate after successfully updating or failing and marking status appropriately). Makes it slightly more readable within Update() for someone scanning over it to understand that the Konnect update is being forked off in its own routine.

@randmonkey
Copy link
Contributor Author

Thanks for reviewing but I would not make changes to this PR because the issue is replaced by #6325.

@randmonkey randmonkey closed this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do Konnect and Gateway reconciliation in different goroutines
2 participants