Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in bpfman-agent due to err not being returned #330

Merged
merged 1 commit into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions controllers/bpfman-agent/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@
id, err := bpfmanagentinternal.GetID(bpfProgram)
if err != nil {
r.Logger.Error(err, "Failed to get bpf program ID")
return bpfmaniov1alpha1.BpfProgCondNotLoaded, nil
return bpfmaniov1alpha1.BpfProgCondNotLoaded, err

Check warning on line 220 in controllers/bpfman-agent/common.go

View check run for this annotation

Codecov / codecov/patch

controllers/bpfman-agent/common.go#L220

Added line #L220 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ID didn't get set by now, how is it going to get set? I don't think it is. Did the users scenario hit this case? However, we think the program is loaded becasue it's in the loadedBpfPrograms list, and the element in that list is the gobpfman.ListResponse_ListResult returned by bpfman, and that has all the info on the program, including the ID. So, maybe we should try to pull it out of there.

}
switch shouldBeLoaded {
case true:
Expand All @@ -232,7 +232,7 @@
r.Logger.V(1).Info("bpf program is in wrong state, unloading and reloading", "reason", reasons, "bpfProgram Name", bpfProgram.Name, "bpf program ID", id)
if err := bpfmanagentinternal.UnloadBpfmanProgram(ctx, r.BpfmanClient, *id); err != nil {
r.Logger.Error(err, "Failed to unload BPF Program")
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, nil
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, err

Check warning on line 235 in controllers/bpfman-agent/common.go

View check run for this annotation

Codecov / codecov/patch

controllers/bpfman-agent/common.go#L235

Added line #L235 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to return an error here. However, as the code stands, I think we will only try this one more time (unless a change to another program kicks off the reconciler again.) If it fails the second time, we'll stop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked about this being an issue, so I wanted to flag it for discussion. Maybe we should keep trying, or at least try a few times. We could count how many times it has failed in the CRD. Maybe do some type of retry with a back off so we keep trying every once in a while. If we could get a failure reason from bpfman, maybe we can determine whether it's a permanent or possibly transient failure to guide whether/when to do a retry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think these changes are all good. I also think we should evaluate whether we should have retries when the bpfman calls fail, and how to do it.

}

r.Logger.Info("Calling bpfman to load bpf program on Node", "bpfProgram Name", bpfProgram.Name)
Expand All @@ -251,7 +251,7 @@
r.Logger.Info("Calling bpfman to unload program on node", "bpfProgram Name", bpfProgram.Name, "Program ID", id)
if err := bpfmanagentinternal.UnloadBpfmanProgram(ctx, r.BpfmanClient, *id); err != nil {
r.Logger.Error(err, "Failed to unload Program")
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, nil
return bpfmaniov1alpha1.BpfProgCondNotUnloaded, err

Check warning on line 254 in controllers/bpfman-agent/common.go

View check run for this annotation

Codecov / codecov/patch

controllers/bpfman-agent/common.go#L254

Added line #L254 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the UnloadBpfmanProgram() case above.

}
}
case false:
Expand All @@ -267,7 +267,7 @@
r.progId, err = bpfmanagentinternal.LoadBpfmanProgram(ctx, r.BpfmanClient, loadRequest)
if err != nil {
r.Logger.Error(err, "Failed to load Program")
return bpfmaniov1alpha1.BpfProgCondNotLoaded, nil
return bpfmaniov1alpha1.BpfProgCondNotLoaded, err

Check warning on line 270 in controllers/bpfman-agent/common.go

View check run for this annotation

Codecov / codecov/patch

controllers/bpfman-agent/common.go#L270

Added line #L270 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the UnloadBpfmanProgram() case above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the logs, I think this is the only case that was hit by the user. It failed, r.progId is set to nil, no error returned to calling function and r.progId being nil caused a panic later. I returned err on all other cases just to be safe.

}
case false:
// The program isn't loaded and it shouldn't be loaded.
Expand Down Expand Up @@ -785,10 +785,10 @@
}
}

existingId, err := bpfmanagentinternal.GetID(&existingBpfProgram)
if err != nil {
return internal.Requeue, fmt.Errorf("failed to get kernel id from bpfProgram: %v", err)
}
// GetID() will fail if ProgramId is not in the annotations, which is expected on a
// create. In this case existingId will be nil and DeepEqual() will fail and cause
// annotation to be set.
existingId, _ := bpfmanagentinternal.GetID(&existingBpfProgram)

// If bpfProgram Maps OR the program ID annotation isn't up to date just update it and return
if !reflect.DeepEqual(existingId, r.progId) {
Expand Down
4 changes: 2 additions & 2 deletions controllers/bpfman-agent/internal/bpfman-core.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,12 +210,12 @@
func GetID(p *bpfmaniov1alpha1.BpfProgram) (*uint32, error) {
idString, ok := p.Annotations[internal.IdAnnotation]
if !ok {
return nil, nil
return nil, fmt.Errorf("failed to get program ID because no annotations")
}

id, err := strconv.ParseUint(idString, 10, 32)
if err != nil {
return nil, fmt.Errorf("failed to get program ID: %v", err)
return nil, fmt.Errorf("failed to parse program ID: %v", err)

Check warning on line 218 in controllers/bpfman-agent/internal/bpfman-core.go

View check run for this annotation

Codecov / codecov/patch

controllers/bpfman-agent/internal/bpfman-core.go#L218

Added line #L218 was not covered by tests
}
uid := uint32(id)
return &uid, nil
Expand Down
Loading