Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in NodeRef controller #1104

Closed
ncdc opened this issue Jul 2, 2019 · 7 comments · Fixed by #1117
Closed

Panic in NodeRef controller #1104

ncdc opened this issue Jul 2, 2019 · 7 comments · Fixed by #1117
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@ncdc
Copy link
Contributor

ncdc commented Jul 2, 2019

/kind bug

What steps did you take and what happened:

  • Create a cluster using CAPI + CAPA in AWS
  • Add a machine
  • NodeRef controller panicked

What did you expect to happen:

  • No panic

Anything else you would like to add:
I hit this return statement:

return reconcile.Result{RequeueAfter: 10 * time.Second}, nil

which returns a nil error. But it crashed at

klog.Infof("Set Machine's (%q in namespace %q) NodeRef to %q", machine.Name, machine.Namespace, machine.Status.NodeRef.Name)

because it didn't check to see if the reconcile needed to be requeued, and machine.status.nodeRef was nil.

Environment:

  • Cluster-api version: master + v0.1.4
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 2, 2019
@ncdc ncdc added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Jul 2, 2019
@ncdc ncdc added this to the v1alpha2 milestone Jul 2, 2019
@ncdc
Copy link
Contributor Author

ncdc commented Jul 2, 2019

Once fixed, we'll need to backport to release-0.1.

@pires
Copy link
Contributor

pires commented Jul 2, 2019

Fixing the panic seems to be as easy as not swallowing the error when returning:

return reconcile.Result{RequeueAfter: 10 * time.Second}, nil

Now, I'm not sure this is enough because the code (or its comments) is not clear whether Reconcile is expected to be called again. Assuming it should be, we also need to set Requeue to true as it defaults to false. edited sigs.k8s.io/controller-runtime/pkg/reconcile.Result defines that setting RequeueAfter implicitly sets Requeue to true.

The end result would be:

return reconcile.Result{RequeueAfter: 10 * time.Second}, err

If this makes sense to you, I think the next step would be to figure out tests that prove the fix.

@vincepri
Copy link
Member

vincepri commented Jul 2, 2019

@pires That seems like a good approach, we can return both

@pires
Copy link
Contributor

pires commented Jul 3, 2019

I'll be opening a PR to address this today.

@timothysc timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 3, 2019
@chuckha
Copy link
Contributor

chuckha commented Jul 3, 2019

I just hit this and have a patch. I do not se lifecycle active so I will take it.

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jul 3, 2019
@ncdc
Copy link
Contributor Author

ncdc commented Jul 3, 2019

@chuckha check with @pires ?

@chuckha
Copy link
Contributor

chuckha commented Jul 3, 2019

ah ok, actually this is a good chance for someone else to get a contribution

/remove-lifecycle active
/unassign

Sorry for the noise @pires, you should for sure work on this

@k8s-ci-robot k8s-ci-robot removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jul 3, 2019
pires added a commit to pires/cluster-api that referenced this issue Jul 3, 2019
Fixes kubernetes-sigs#1104

Signed-off-by: Pires <pjpires@gmail.com>
pires added a commit to pires/cluster-api that referenced this issue Jul 3, 2019
Fixes kubernetes-sigs#1104

Signed-off-by: Pires <pjpires@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
6 participants