Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(proxy) give manager leader control over start #2053

Merged
merged 6 commits into from
Dec 17, 2021
Merged

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Dec 6, 2021

What this PR does / why we need it:

  • Refactors the proxy update and status update subsystems into LeaderElectionRunnables and starts them via manager.Add(). These subsystems now only run when an instance is a leader. This fixes the part of Leader election defaults and implementation do not result in expected behavior #2052 where enabling leader election would actually attempt to push a null configuration because the proxy update loop was running, but the controllers were not. While the status updater caused no issues (it will simply do nothing when given an empty configuration), running multiple instances of it was pointless.
  • Removes the --elect-leader flag in favor of determining whether to elect a leader automatically based on the database mode (DB-less does not use leaders, database-backed does). This is a minor breaking change (the controller will reject the removed flag if present). We could optionally leave it as a deprecated no-op flag if we don't want to make the breaking change immediately.
  • Adds an --election-namespace flag. This is necessary to run integration tests when leader election is enabled, as the controller has no way of determining the election namespace when run outside a cluster. It is unlikely to have any production use.
  • Adds a new multi-replica Postgres controller test. The test confirms leadership behavior based on whether metrics report that the controller has successfully synced configuration.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #2052

Special notes for your reviewer:

  • Backporting this is iffy, since this fails tests without the new --election-namespace flag. We could remove that and the automatic mode detection, but in that case we simply don't test leader election mode in the backport. This is maybe okay since the flag is only needed for test instances (unless you run leader-election-enabled instances outside the cluster in prod for some reason, but you can't do that with current 2.0.x anyway).
  • This is a new E2E test that fails against the current version. An example override run versus 2.0.x is available in postgres_replicas.txt. Merging this will cause nightly tests to fail that test. We already have one such test from Do not block config updates if we cannot not set up the status subsystem #2005. We should decide how we want to handle these better, probably by running nightly E2E against nightly builds.
  • The admission controller behavior is an open question. We make use of the Kubernetes client to load other resources for the consumer/credential checks. I'm unsure how these behave on non-leaders. This PR shouldn't change anything about that, since it's only disabling proxy updates in addition to the controllers, which were already disabled. However, since leader election was previously a manual flag (and was not in 1.x), it's quite possible that nobody had enabled it to observe any problem behavior. I don't think we can simply disable it because the Service used in the webhook configuration points at the entire Deployment.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

@rainest rainest requested a review from a team as a code owner December 6, 2021 22:48
@rainest rainest temporarily deployed to Configure ci December 6, 2021 22:48 Inactive
@rainest rainest marked this pull request as draft December 6, 2021 22:48
@rainest rainest temporarily deployed to Configure ci December 6, 2021 22:52 Inactive
@rainest rainest temporarily deployed to Configure ci December 6, 2021 23:15 Inactive
@rainest rainest temporarily deployed to Configure ci December 6, 2021 23:26 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 21:24 Inactive
@github-actions
Copy link

Licenses differ between commit b53ddc4 and base:

+++ pr_licenses.csv	2021-12-14 21:26:27.679421657 +0000
@@ -86,7 +86,7 @@
 k8s.io/client-go,Unknown,Apache-2.0
 k8s.io/component-base,Unknown,Apache-2.0
 k8s.io/klog/v2,Unknown,Apache-2.0
-k8s.io/kube-openapi/pkg/util/proto,Unknown,Apache-2.0
+k8s.io/kube-openapi/pkg,Unknown,Apache-2.0
 k8s.io/utils,Unknown,Apache-2.0
 knative.dev/networking/pkg,Unknown,Apache-2.0
 knative.dev/pkg,Unknown,Apache-2.0```

@rainest rainest temporarily deployed to Configure ci December 14, 2021 21:38 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 21:47 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 21:54 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 22:07 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 22:11 Inactive
@rainest rainest temporarily deployed to Configure ci December 14, 2021 22:20 Inactive
@rainest rainest marked this pull request as ready for review December 14, 2021 22:29
@rainest rainest temporarily deployed to Configure ci December 14, 2021 22:35 Inactive
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! I have a few suggestions and comments I would like to see resolved before we merge but mostly LGTM.

CHANGELOG.md Outdated Show resolved Hide resolved
internal/ctrlutils/ingress-status.go Outdated Show resolved Hide resolved
internal/ctrlutils/ingress-status.go Outdated Show resolved Hide resolved
internal/ctrlutils/ingress-status.go Outdated Show resolved Hide resolved
internal/ctrlutils/ingress-status.go Outdated Show resolved Hide resolved
test/e2e/all_in_one_test.go Show resolved Hide resolved
test/e2e/all_in_one_test.go Show resolved Hide resolved
test/e2e/all_in_one_test.go Outdated Show resolved Hide resolved
test/e2e/all_in_one_test.go Outdated Show resolved Hide resolved
test/e2e/all_in_one_test.go Show resolved Hide resolved
@rainest rainest temporarily deployed to Configure ci December 15, 2021 21:26 Inactive
@shaneutt shaneutt dismissed their stale review December 15, 2021 21:28

not going to hard block this.

@rainest rainest temporarily deployed to Configure ci December 15, 2021 21:31 Inactive
Travis Raines added 6 commits December 17, 2021 14:36
Rather than starting the proxy update loop immediately after setting it
up, register the proxy with the manager and let the manager choose
whether or not to start the update loop based on whether it is the
leader, or leader election is disabled.
Add a Postgres-backed test with multiple replicas. The test initially
starts one replica and scales to two. It checks that the second replica
is not acting as a leader. It then destroys the original leader and
confirms that the second replica becomes the leader.

Add a utility function to port-forward to a Pod.
Refactor the status updater to provide a Kubebuilder
LeaderElectionRunnable.

Run the status updater via manager.Add() so that it only runs if the
instance is a leader.
Add an --election-namespace flag. This sets the leader election
namespace when an instance is running outside of the cluster. This is
necessary to run integration tests.
Deprecate the --leader-elect flag. It is no longer used. Leader election
is enabled or disabled based on whether the instance is running in
DB-less mode or not.
@rainest
Copy link
Contributor Author

rainest commented Dec 17, 2021

Rebased to roll PR changes into appropriate commits, since we don't want to squash this/will need to cherry-pick pieces into a 2.0.x backport.

@rainest rainest temporarily deployed to Configure ci December 17, 2021 23:00 Inactive
@rainest rainest temporarily deployed to Configure ci December 17, 2021 23:14 Inactive
@rainest rainest merged commit b31e5cb into main Dec 17, 2021
@rainest rainest deleted the fix/proxy-leader branch December 17, 2021 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leader election defaults and implementation do not result in expected behavior
3 participants