Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core code to support namespace based CRDs #344

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

Billy99
Copy link
Contributor

@Billy99 Billy99 commented Dec 3, 2024

For security reasons, cluster admins may want to limit certain applications to only loading eBPF programs within a given namespace. Currently, all bpfman Custom Resource Definitions (CRDs) are Cluster scoped. To provide cluster admins
with tighter controls on eBPF program loading, some of the bpfman CRDs also need to be Namespace scoped.

See Design Doc: bpfman/bpfman#1359

@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch 2 times, most recently from f776292 to 6344cc7 Compare December 4, 2024 19:25
Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 45.44025% with 1388 lines in your changes missing coverage. Please review.

Project coverage is 28.13%. Comparing base (6828e71) to head (e4978c9).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
apis/v1alpha1/zz_generated.deepcopy.go 66.99% 121 Missing and 14 partials ⚠️
controllers/bpfman-agent/application-ns-program.go 41.31% 121 Missing and 4 partials ⚠️
.../clientset/typed/apis/v1alpha1/bpfnsapplication.go 0.00% 107 Missing ⚠️
controllers/bpfman-agent/tc-ns-program.go 58.33% 70 Missing and 10 partials ⚠️
controllers/bpfman-agent/tcx-ns-program.go 58.11% 70 Missing and 10 partials ⚠️
controllers/bpfman-agent/xdp-ns-program.go 59.25% 68 Missing and 9 partials ⚠️
controllers/bpfman-agent/uprobe-ns-program.go 63.73% 60 Missing and 10 partials ⚠️
cmd/bpfman-agent/main.go 0.00% 57 Missing ⚠️
cmd/bpfman-operator/main.go 0.00% 55 Missing ⚠️
...rollers/bpfman-operator/application-ns-programs.go 24.24% 48 Missing and 2 partials ⚠️
... and 39 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
+ Coverage   27.18%   28.13%   +0.94%     
==========================================
  Files          88      128      +40     
  Lines        7786    11202    +3416     
==========================================
+ Hits         2117     3152    +1035     
- Misses       5460     7768    +2308     
- Partials      209      282      +73     
Flag Coverage Δ
unittests 28.13% <45.44%> (+0.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch 2 times, most recently from d903f0a to 5b34117 Compare December 9, 2024 21:11
Copy link
Contributor

mergify bot commented Dec 9, 2024

@Billy99, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Dec 9, 2024
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from 5b34117 to 3850b25 Compare December 9, 2024 21:15
@mergify mergify bot removed the needs-rebase label Dec 9, 2024
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch 3 times, most recently from e67530f to 41b175b Compare December 11, 2024 21:30
@Billy99 Billy99 marked this pull request as ready for review December 11, 2024 21:30
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from 41b175b to b545665 Compare December 12, 2024 18:21
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from b545665 to bb76968 Compare December 12, 2024 19:46
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch 2 times, most recently from e03103a to 896aa16 Compare December 17, 2024 20:37
Copy link
Contributor

@anfredette anfredette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm still planning to test it, but I only saw one minor thing on the review.


// Containers identifies the set of containers in which to attach the eBPF
// program.
Containers *ContainerNsSelector `json:"containers"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Containers is required for the namespaced crds, it shouldn't be a pointer. This applies to all of them. Then, you don't need to check for nil in the controller code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@anfredette
Copy link
Contributor

I did the following test:

Deploy nginx:

kubectl apply -f ./hack/nginx-deployment.yaml

Apply some sample programs:

kubectl apply -f config/samples/bpfman.io_v1alpha1_tc_pass_tcnsprogram.yaml
kubectl apply -f config/samples/bpfman.io_v1alpha1_tcx_pass_tcxnsprogram.yaml
kubectl apply -f config/samples/bpfman.io_v1alpha1_uprobe_uprobensprogram.yaml
kubectl apply -f config/samples/bpfman.io_v1alpha1_xdp_pass_xdpnsprogram.yaml

Delete the nginx deployment:

kubectl delete -f ./hack/nginx-deployment.yaml

The programs should get deleted because there aren't any matching containers anymore. tcx and uprobes get deleted fine, but, I'm seeing errors like the following for tc and xdp:

{"level":"error","ts":"2024-12-18T19:39:36Z","logger":"tc-ns","msg":"Failed to unload eBPF Program","error":"failed to unload bpfProgram via bpfman: rpc error: code = Aborted desc = An error occurred. Failed to get metadata for namespace path: No such file or directory (os error 2)",

This is due to a bug in my bpfman netns pr where I can't find the NSID after I've deleted the pods, but it's part of the dispatcher id and I need to figure that out so I can delete it. So, I'm going to have to figure that one out, but I won't hold this pr up for it.

@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from 896aa16 to 059a202 Compare December 18, 2024 20:46
@anfredette
Copy link
Contributor

bpfman/bpfman#1362 opened for nginx delete issue described above.

internal.BpfNsProgramTypePredicate(internal.Xdp.String()),
internal.BpfProgramNodePredicate(r.NodeName)),
),
).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
).
).
// Watch for changes in Pod resources in case we are using a container selector.
Watches(
&v1.Pod{},
&handler.EnqueueRequestForObject{},
builder.WithPredicates(podOnNodePredicate(r.NodeName)),
).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@anfredette
Copy link
Contributor

anfredette commented Dec 19, 2024

Also, I'm seeing some errors when deleting the programs:

$ kubectl delete -f config/samples/bpfman.io_v1alpha1_tc_pass_tcnsprogram.yaml
kubectl delete -f config/samples/bpfman.io_v1alpha1_tcx_pass_tcxnsprogram.yaml
kubectl delete -f config/samples/bpfman.io_v1alpha1_uprobe_uprobensprogram.yaml
kubectl delete -f config/samples/bpfman.io_v1alpha1_xdp_pass_xdpnsprogram.yaml
tcnsprogram.bpfman.io "tc-containers" deleted
E1219 09:14:09.930282 3717390 reflector.go:150] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: unknown
E1219 09:14:11.505401 3717390 reflector.go:150] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: unknown
tcxnsprogram.bpfman.io "tcx-containers" deleted

Then, in the logs, I see errors like the following:

{"level":"info","ts":"2024-12-19T14:14:11Z","logger":"tcx-ns","msg":"bpfman-agent enter: tcx-ns","Namespace":"acme","Name":"tcx-containers"}
{"level":"info","ts":"2024-12-19T14:14:11Z","logger":"tcx-ns","msg":"Calling KubeAPI to remove finalizer from BpfProgram","object name":"tcx-containers-0f0cb13b"}
{"level":"info","ts":"2024-12-19T14:14:11Z","logger":"tcx-ns","msg":"bpfman-agent enter: tcx-ns","Namespace":"acme","Name":"tcx-containers"}
{"level":"info","ts":"2024-12-19T14:14:11Z","logger":"tcx-ns","msg":"Calling KubeAPI to update BpfProgram condition","Name":"tcx-containers-0f0cb13b","condition":"Unloaded"}
{"level":"error","ts":"2024-12-19T14:14:11Z","logger":"tcx-ns","msg":"failed to set BpfProgram object status","error":"Operation cannot be fulfilled on bpfnsprograms.bpfman.io \"tcx-containers-0f0cb13b\": StorageError: invalid object, Code: 4, Key: /registry/bpfman.io/bpfnsprograms/acme/tcx-containers-0f0cb13b, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: dd2fbf03-4b81-4d65-9839-a273c1daf5ea, UID in object meta: ","stacktrace":"github.com/bpfman/bpfman-operator/controllers/bpfman-agent.(*ReconcilerCommon[...]).updateStatus\n\t/usr/src/bpfman-operator/controllers/bpfman-agent/common.go:554\ngithub.com/bpfman/bpfman-operator/controllers/bpfman-agent.(*ReconcilerCommon[...]).handleProgDelete\n\t/usr/src/bpfman-operator/controllers/bpfman-agent/common.go:664\ngithub.com/bpfman/bpfman-operator/controllers/bpfman-agent.(*ReconcilerCommon[...]).reconcileProgram\n\t/usr/src/bpfman-operator/controllers/bpfman-agent/common.go:886\ngithub.com/bpfman/bpfman-operator/controllers/bpfman-agent.(*ReconcilerCommon[...]).reconcileCommon\n\t/usr/src/bpfman-operator/controllers/bpfman-agent/common.go:215\ngithub.com/bpfman/bpfman-operator/controllers/bpfman-agent.(*TcxNsProgramReconciler).Reconcile\n\t/usr/src/bpfman-operator/controllers/bpfman-agent/tcx-ns-program.go:276\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/usr/src/bpfman-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/usr/src/bpfman-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/usr/src/bpfman-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/usr/src/bpfman-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}

I haven't tried to debug it yet, but could it be a missing permission for the namespace scoped crd, or the role?

The deletes all succeed despite the error messages.

@msherif1234
Copy link
Contributor

msherif1234 commented Dec 19, 2024

@Billy99 did u use operator-sdk to create those NS objects ?
something like
operator-sdk create api --group bpfman.io --version v1alpha1 --kind NSBpfApplication --resource --controller --namespaced=false
I don't see PROJECT file in ur changeset ?

@anfredette
Copy link
Contributor

I haven't tried to debug it yet, but could it be a missing permission for the namespace scoped crd, or the role?

Actually, it looks like the code is trying to do the final update on the status, but the object is gone, so I suspect it's a finalizer issue.

@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from 059a202 to e4978c9 Compare December 19, 2024 16:23
- get
- apiGroups:
- bpfman.io
# resources: ['xdpnsprograms']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment?

- create
- delete
- get
- list
Copy link
Contributor

@anfredette anfredette Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding watch here is what fixed it:

Suggested change
- list
- list
- watch

resources:
- bpfnsprograms/status
verbs:
- get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't make a difference for the error I was seeing, but why not include list and watch?

Suggested change
- get
- get
- list
- watch

For security reasons, cluster admins may want to limit certain applications
to only loading eBPF programs within a given namespace. Currently, all bpfman
Custom Resource Definitions (CRDs) are Cluster scoped. To provide cluster admins
with tighter controls on eBPF program loading, some of the bpfman CRDs also need
to be Namespace scoped.

See Design Doc: bpfman/bpfman#1359

Signed-off-by: Billy McFall <22157057+Billy99@users.noreply.github.com>
@Billy99 Billy99 force-pushed the billy99-namespace-scoped branch from e4978c9 to 7bfea5c Compare December 20, 2024 16:34
Copy link
Contributor

@anfredette anfredette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@mergify mergify bot merged commit 9b4813d into bpfman:main Dec 20, 2024
13 checks passed
@Billy99 Billy99 deleted the billy99-namespace-scoped branch December 20, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants