Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(trafficrouting): add nil check for desired annotations map in ALB… #3853

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

y-rabie
Copy link
Contributor

@y-rabie y-rabie commented Sep 26, 2024

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

Bug description

Sometimes, the argo-rollouts controller goes into a loop of panicking/recovering from panic when creating a new Rollout object.
The Rollout object is created and is stuck in the Progressing state, the replicaset is created as well, except that it's stuck with .spec.replicas = 0.

To Reproduce

To reproduce, create any simple Rollout object that references an ALB ingress. Make sure that the ingress has no annotations present on it (this is not the case when applying through kubectl, since a kubectl.kubernetes.io/last-applied-configuration is added by default. This is the case however when creating through, for example, ArgoCD, with no desired annotations of mine to add.

You can create through kubectl and manually delete the annotation before creating the Rollout object referencing the ingress to test the behavior.

This is currently happening on the latest v1.7.2

Logs

time="2024-09-25T11:21:31Z" level=info msg="Started syncing rollout" generation=2 namespace=app-rollouts-testing resourceVersion=25941003 rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=info msg="delaying service switch from  to 5ccb4fd965: ReplicaSet not fully available" namespace=app-rollouts-testing rollout=simple-rollout service=nginx-canary
time="2024-09-25T11:21:31Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=info msg="Reconciling TrafficRouting with type 'ALB'" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=info msg="Reconciliation completed" generation=2 namespace=app-rollouts-testing resourceVersion=25941003 rollout=simple-rollout time_ms=3.53044
time="2024-09-25T11:21:31Z" level=error msg="Recovered from panic: assignment to entry in nil map\ngoroutine 344 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:151 +0x49\npanic({0x2c661c0?, 0x390b9c0?})\n\t/usr/local/go/src/runtime/panic.go:914 +0x21f\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.getDesiredAnnotations(0x2a9bfa0?, 0xc000f02a00, 0x5?, 0x0?, {0x53d4be0, 0x0, 0x0})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:412 +0xbc\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.(*Reconciler).SetWeightPerIngress(0xc001ae9710, 0x11a33d0?, {0xc0011a33c0?, 0x1, 0x4?}, {0x53d4be0, 0x0, 0x0})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:105 +0x1a8\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.(*Reconciler).SetWeight(0x3969f00?, 0xf02a00?, {0x53d4be0?, 0x1a?, 0xc0011a3520?})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:82 +0x9b\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcileTrafficRouting(0xc0005fe400)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:258 +0x956\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).rolloutCanary(0xc0005fe400)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/canary.go:57 +0x1d3\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcile(0xc0005fe400)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/context.go:86 +0xd4\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc000b78000, {0x3944730, 0xc0006c8500}, {0xc0018b5110, 0x23})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:437 +0x4d3\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:155 +0x7d\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x39574b0?, 0xc00064ef40}, {0x31bf658, 0x7}, 0xc0011fee70, {0x3944730?, 0xc0006c8500}, 0xe?, {0x2a9bfa0, 0xc000c97690})\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:159 +0x3ed\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x3944730, 0xc0006c8500}, {0x39574b0, 0xc00064ef40}, {0x31bf658, 0x7}, 0xc000aca5a0?, 0x3932650?)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:178 +0xb5\ngithub.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:106\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:358 +0xb2\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:226 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3912c00, 0xc000bf5dd0}, 0x1, 0xc000baeba0)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:227 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x3921bc0?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:204 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0xc000e1d818?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:161 +0x1e\ncreated by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run in goroutine 287\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:357 +0xa7\n" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=error msg="rollout syncHandler error: Recovered from Panic" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=info msg="rollout syncHandler queue retries: 27 : key \"app-rollouts-testing/simple-rollout\"" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:31Z" level=error msg="Recovered from Panic" error="<nil>"
time="2024-09-25T11:21:41Z" level=info msg="Started syncing rollout" generation=2 namespace=app-rollouts-testing resourceVersion=25941003 rollout=simple-rollout
time="2024-09-25T11:21:41Z" level=info msg="delaying service switch from  to 5ccb4fd965: ReplicaSet not fully available" namespace=app-rollouts-testing rollout=simple-rollout service=nginx-canary
time="2024-09-25T11:21:41Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:41Z" level=info msg="Reconciling TrafficRouting with type 'ALB'" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:41Z" level=info msg="Reconciliation completed" generation=2 namespace=app-rollouts-testing resourceVersion=25941003 rollout=simple-rollout time_ms=1.758102
time="2024-09-25T11:21:41Z" level=error msg="Recovered from panic: assignment to entry in nil map\ngoroutine 336 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:151 +0x49\npanic({0x2c661c0?, 0x390b9c0?})\n\t/usr/local/go/src/runtime/panic.go:914 +0x21f\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.getDesiredAnnotations(0x2a9bfa0?, 0xc000e9a300, 0x5?, 0x0?, {0x53d4be0, 0x0, 0x0})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:412 +0xbc\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.(*Reconciler).SetWeightPerIngress(0xc001633d40, 0x9973d0?, {0xc0009973c0?, 0x1, 0x4?}, {0x53d4be0, 0x0, 0x0})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:105 +0x1a8\ngithub.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.(*Reconciler).SetWeight(0x3969f00?, 0xe9a300?, {0x53d4be0?, 0x1a?, 0xc000997520?})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb/alb.go:82 +0x9b\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcileTrafficRouting(0xc001587c00)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:258 +0x956\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).rolloutCanary(0xc001587c00)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/canary.go:57 +0x1d3\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcile(0xc001587c00)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/context.go:86 +0xd4\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc000b78000, {0x3944730, 0xc0006c8500}, {0xc0018b5110, 0x23})\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:437 +0x4d3\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:155 +0x7d\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x39574b0?, 0xc00064ef40}, {0x31bf658, 0x7}, 0xc000997e70, {0x3944730?, 0xc0006c8500}, 0x0?, {0x2a9bfa0, 0xc00196d980})\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:159 +0x3ed\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x3944730, 0xc0006c8500}, {0x39574b0, 0xc00064ef40}, {0x31bf658, 0x7}, 0x0?, 0x0?)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:178 +0xb5\ngithub.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:106\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:358 +0xb2\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:226 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3912c00, 0xc000bf5bf0}, 0x1, 0xc000baeba0)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:227 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:204 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:161 +0x1e\ncreated by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run in goroutine 287\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:357 +0xa7\n" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:41Z" level=error msg="rollout syncHandler error: Recovered from Panic" namespace=app-rollouts-testing rollout=simple-rollout
time="2024-09-25T11:21:41Z" level=info msg="rollout syncHandler queue retries: 28 : key \"app-rollouts-testing/simple-rollout\"" namespace=app-rollouts-testing rollout=simple-rollout

Testing

I have built the image and tested the behavior and it's working as expected.

… ingress

Signed-off-by: Youssef Rabie <youssef.rabie@procore.com>
Copy link

Copy link
Contributor

Published E2E Test Results

  4 files    4 suites   3h 14m 47s ⏱️
113 tests 104 ✅  7 💤 2 ❌
456 runs  424 ✅ 28 💤 4 ❌

For more details on these failures, see this check.

Results for commit 1884643.

Copy link
Contributor

Published Unit Test Results

2 275 tests   2 275 ✅  2m 59s ⏱️
  128 suites      0 💤
    1 files        0 ❌

Results for commit 1884643.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.84%. Comparing base (ff93786) to head (1884643).
Report is 74 commits behind head on master.

Files with missing lines Patch % Lines
rollout/trafficrouting/alb/alb.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3853      +/-   ##
==========================================
- Coverage   83.86%   83.84%   -0.02%     
==========================================
  Files         163      163              
  Lines       18560    18562       +2     
==========================================
- Hits        15565    15563       -2     
- Misses       2121     2123       +2     
- Partials      874      876       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zachaller zachaller merged commit f32b2f3 into argoproj:master Sep 27, 2024
23 of 24 checks passed
@y-rabie y-rabie deleted the fix-alb-nil-annotations branch October 13, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants