-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic in bpfman-agent due to err not being returned #330
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -217,7 +217,7 @@ | |
id, err := bpfmanagentinternal.GetID(bpfProgram) | ||
if err != nil { | ||
r.Logger.Error(err, "Failed to get bpf program ID") | ||
return bpfmaniov1alpha1.BpfProgCondNotLoaded, nil | ||
return bpfmaniov1alpha1.BpfProgCondNotLoaded, err | ||
} | ||
switch shouldBeLoaded { | ||
case true: | ||
|
@@ -232,7 +232,7 @@ | |
r.Logger.V(1).Info("bpf program is in wrong state, unloading and reloading", "reason", reasons, "bpfProgram Name", bpfProgram.Name, "bpf program ID", id) | ||
if err := bpfmanagentinternal.UnloadBpfmanProgram(ctx, r.BpfmanClient, *id); err != nil { | ||
r.Logger.Error(err, "Failed to unload BPF Program") | ||
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, nil | ||
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, err | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems reasonable to return an error here. However, as the code stands, I think we will only try this one more time (unless a change to another program kicks off the reconciler again.) If it fails the second time, we'll stop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've talked about this being an issue, so I wanted to flag it for discussion. Maybe we should keep trying, or at least try a few times. We could count how many times it has failed in the CRD. Maybe do some type of retry with a back off so we keep trying every once in a while. If we could get a failure reason from bpfman, maybe we can determine whether it's a permanent or possibly transient failure to guide whether/when to do a retry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I think these changes are all good. I also think we should evaluate whether we should have retries when the bpfman calls fail, and how to do it. |
||
} | ||
|
||
r.Logger.Info("Calling bpfman to load bpf program on Node", "bpfProgram Name", bpfProgram.Name) | ||
|
@@ -251,7 +251,7 @@ | |
r.Logger.Info("Calling bpfman to unload program on node", "bpfProgram Name", bpfProgram.Name, "Program ID", id) | ||
if err := bpfmanagentinternal.UnloadBpfmanProgram(ctx, r.BpfmanClient, *id); err != nil { | ||
r.Logger.Error(err, "Failed to unload Program") | ||
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, nil | ||
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, err | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as for the UnloadBpfmanProgram() case above. |
||
} | ||
} | ||
case false: | ||
|
@@ -267,7 +267,7 @@ | |
r.progId, err = bpfmanagentinternal.LoadBpfmanProgram(ctx, r.BpfmanClient, loadRequest) | ||
if err != nil { | ||
r.Logger.Error(err, "Failed to load Program") | ||
return bpfmaniov1alpha1.BpfProgCondNotLoaded, nil | ||
return bpfmaniov1alpha1.BpfProgCondNotLoaded, err | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as for the UnloadBpfmanProgram() case above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the logs, I think this is the only case that was hit by the user. It failed, r.progId is set to nil, no error returned to calling function and r.progId being nil caused a panic later. I returned err on all other cases just to be safe. |
||
} | ||
case false: | ||
// The program isn't loaded and it shouldn't be loaded. | ||
|
@@ -785,10 +785,10 @@ | |
} | ||
} | ||
|
||
existingId, err := bpfmanagentinternal.GetID(&existingBpfProgram) | ||
if err != nil { | ||
return internal.Requeue, fmt.Errorf("failed to get kernel id from bpfProgram: %v", err) | ||
} | ||
// GetID() will fail if ProgramId is not in the annotations, which is expected on a | ||
// create. In this case existingId will be nil and DeepEqual() will fail and cause | ||
// annotation to be set. | ||
existingId, _ := bpfmanagentinternal.GetID(&existingBpfProgram) | ||
|
||
// If bpfProgram Maps OR the program ID annotation isn't up to date just update it and return | ||
if !reflect.DeepEqual(existingId, r.progId) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ID didn't get set by now, how is it going to get set? I don't think it is. Did the users scenario hit this case? However, we think the program is loaded becasue it's in the loadedBpfPrograms list, and the element in that list is the gobpfman.ListResponse_ListResult returned by bpfman, and that has all the info on the program, including the ID. So, maybe we should try to pull it out of there.