bpf: fix policyfilter issue for existing processes #1590

kkourt · 2023-10-13T10:27:46Z

There is an issue with the cgroup tracker id (details can be found below). This issue causes policyfilter to misbehave. Furthermore, the original intention behind the cgroup tracker id was never implemented and so it currently serves no purpose. Its only user is the policyfilter.

This patch removes the cgroup tracker id, and, instead, computes the cgroup id at the time of the policyfilter check. For most common setups (cgroupv2) the overhead of doing so is negligible. Computing the cgroup id at the time of check is more reliable and also simplifies the code.

More details can be found below.

We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the following on a pod in the same namespace:

kubectl exec -it nonqos-demo -- curl 8.8.8.8

We observe two things:

there is no event from the policy (as we were expecting)
the cgroup tracker id of the curl process is the same as the parent runc process (which should not be the case)

The exec event we get from curl has:
.process.pid: 244958
.process.binary: /usr/bin/curl
.parent.pid: 244948
.parent.binary: /usr/bin/runc

Looking at the cgroup tracker id in the execve map for these two pids we see:
"cgrpid_tracker": 2371
"cgrpid_tracker": 2371

Which means that the cgroup id tracker is not correct, and the curl process will not be matched by the policy.

The issue might arise from the fact that in the wake_up_new_task(), computing the cgroup tracker id, uses get_current_cgroup_id(). We are, however, still in the parent so the get_current_cgroup_id() helper will return the cgroup of the parent, not of the child (which is what we want).

To simplify the code complexity (both from the perspective of understandabiliuty but also verification complexity), we instead remove the tracker id and retrieve the current cgroup id at the time of the check. This is much simpler, and not less efficient for the common case (cgroupv2).

Following the same steps with a version build with the current patch, produces an event as expected.

There is an issue with the cgroup tracker id (details can be found below). This issue causes policyfilter to misbehave. Furthermore, the original intention behind the cgroup tracker id was never realized and so it currently serves no purpose. Its only user is the policyfilter. This PR removes the cgroup tracker id, and, instead, computes the cgroup id at the time of the policyfilter check. For most common setups (cgroupv2) the overhead of doing so is negligible. Computing the cgroup id at the time of check is more reliable and also simplifies the code. More details can be found below. We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the following on a pod in the same namespace: `kubectl exec -it nonqos-demo -- curl 8.8.8.8` We observe two things: * there is no event from the policy (as it should be) * the cgroup tracker id of the curl process is the same as the parent runc process (which should not be the case) The exec event we get from curl has: .process.pid: 244958 .process.binary: /usr/bin/curl .parent.pid: 244948 .parent.binary: /usr/bin/runc Looking at the cgroup tracker id in the execve map for these two pids we see: "cgrpid_tracker": 2371 "cgrpid_tracker": 2371 Which means that the cgroup id tracker is not correct, and the curl process will not be matched by the policy. The issue might arise from the fact that in the wake_up_new_task(), computing the cgroup tracker id, uses get_current_cgroup_id(). We are, however, still in the parent so the get_current_cgroup_id() helper will return the cgroup of the parent, not of the child (which is what we want). To simplify the code complexity (both from the perspective of understandabiliuty but also verification complexity), we instead remove the tracker id and retrieve the current cgroup id at the time of the check. This is much simpler, and not less efficient for the common case (cgroupv2). Following the same steps with a version build with the current patch, produces an event as expected. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

jrfastab · 2023-10-13T17:08:42Z

Nice lets go for correctness first and we can optimize later if perf ever is measurable.

tixxdz · 2023-10-14T23:03:43Z

btw @jrfastab, @kkourt usually those cases will fork and then migrate the tasks to the appropriate cgroup since they don't fork from pid 1 of container, so to catch those in bpf I planned this https://github.com/cilium/tetragon/blob/pr/tixxdz/cgroup-bpf-full/bpf/cgroup/bpf_cgroup_attach_task.c#L32 but due to lack of time...
There is also the new way of CLONE into new cgroup https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6-rc5&id=ef2c41cf38a7559bbf91af42d5b6a4429db8fc68 , which I'm tracking in a separate clone+namespaces issue...

For this wake_up_new_task() even if we are passing the new task and fetching its cgroup (not the parent) but down the line if the fs uses a cgroupv2 then it switches back to cgroupv2 helpers which only work with current context...

Our own bpf cgroup helpers are far superior since they allows to work on target tasks https://github.com/cilium/tetragon/blob/main/bpf/lib/bpf_cgroup.h#L377 transparently for both cgroupv1 and cgroupv2, I should just have used this instead of upstream bpf helper...

Anyway thanks for fixing this, I plan to retake and finish it.

kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch from b6782d0 to b5e7bc3 Compare October 13, 2023 10:31

kkourt added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Oct 13, 2023

kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch 2 times, most recently from fb57ba0 to 0419ece Compare October 13, 2023 10:59

kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch from 0419ece to 37223bd Compare October 13, 2023 11:25

kkourt marked this pull request as ready for review October 13, 2023 12:06

kkourt requested a review from a team as a code owner October 13, 2023 12:06

kkourt requested a review from olsajiri October 13, 2023 12:06

jrfastab approved these changes Oct 13, 2023

View reviewed changes

jrfastab merged commit 2ebcd9a into main Oct 13, 2023
30 checks passed

jrfastab deleted the pr/kkourt/remove_cgroupid_tracker branch October 13, 2023 17:08

kkourt changed the title ~~bpf: remove cgrpid_tracker~~ bpf: fix policyfilter issue for existing processes Nov 1, 2023

p-linnane mentioned this pull request Nov 1, 2023

tetra 1.0.0 Homebrew/homebrew-core#153073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf: fix policyfilter issue for existing processes #1590

bpf: fix policyfilter issue for existing processes #1590

kkourt commented Oct 13, 2023 •

edited

Loading

jrfastab commented Oct 13, 2023

tixxdz commented Oct 14, 2023

bpf: fix policyfilter issue for existing processes #1590

bpf: fix policyfilter issue for existing processes #1590

Conversation

kkourt commented Oct 13, 2023 • edited Loading

jrfastab commented Oct 13, 2023

tixxdz commented Oct 14, 2023

kkourt commented Oct 13, 2023 •

edited

Loading