Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: fix policyfilter issue for existing processes #1590

Merged
merged 1 commit into from
Oct 13, 2023

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Oct 13, 2023

There is an issue with the cgroup tracker id (details can be found below). This issue causes policyfilter to misbehave. Furthermore, the original intention behind the cgroup tracker id was never implemented and so it currently serves no purpose. Its only user is the policyfilter.

This patch removes the cgroup tracker id, and, instead, computes the cgroup id at the time of the policyfilter check. For most common setups (cgroupv2) the overhead of doing so is negligible. Computing the cgroup id at the time of check is more reliable and also simplifies the code.

More details can be found below.

We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the following on a pod in the same namespace:

kubectl exec -it nonqos-demo -- curl 8.8.8.8

We observe two things:

  • there is no event from the policy (as we were expecting)
  • the cgroup tracker id of the curl process is the same as the parent runc process (which should not be the case)

The exec event we get from curl has:
.process.pid: 244958
.process.binary: /usr/bin/curl
.parent.pid: 244948
.parent.binary: /usr/bin/runc

Looking at the cgroup tracker id in the execve map for these two pids we see:
"cgrpid_tracker": 2371
"cgrpid_tracker": 2371

Which means that the cgroup id tracker is not correct, and the curl process will not be matched by the policy.

The issue might arise from the fact that in the wake_up_new_task(), computing the cgroup tracker id, uses get_current_cgroup_id(). We are, however, still in the parent so the get_current_cgroup_id() helper will return the cgroup of the parent, not of the child (which is what we want).

To simplify the code complexity (both from the perspective of understandabiliuty but also verification complexity), we instead remove the tracker id and retrieve the current cgroup id at the time of the check. This is much simpler, and not less efficient for the common case (cgroupv2).

Following the same steps with a version build with the current patch, produces an event as expected.

@kkourt kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch from b6782d0 to b5e7bc3 Compare October 13, 2023 10:31
@kkourt kkourt added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Oct 13, 2023
@kkourt kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch 2 times, most recently from fb57ba0 to 0419ece Compare October 13, 2023 10:59
There is an issue with the cgroup tracker id (details can be found
below). This issue causes policyfilter to misbehave. Furthermore, the
original intention behind the cgroup tracker id was never realized
and so it currently serves no purpose. Its only user is the
policyfilter.

This PR removes the cgroup tracker id, and, instead, computes the cgroup
id at the time of the policyfilter check. For most common setups
(cgroupv2) the overhead of doing so is negligible. Computing the cgroup
id at the time of check is more reliable and also simplifies the code.

More details can be found below.

We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the
following on a pod in the same namespace:

`kubectl exec -it nonqos-demo -- curl 8.8.8.8`

We observe two things:
 * there is no event from the policy (as it should be)
 * the cgroup tracker id of the curl process is the same as the parent runc process (which should
   not be the case)

The exec event we get from curl has:
 .process.pid: 244958
 .process.binary: /usr/bin/curl
 .parent.pid: 244948
 .parent.binary: /usr/bin/runc

Looking at the cgroup tracker id in the execve map for these two pids we see:
"cgrpid_tracker": 2371
"cgrpid_tracker": 2371

Which means that the cgroup id tracker is not correct, and the curl
process will not be matched by the policy.

The issue might arise from the fact that in the wake_up_new_task(),
computing the cgroup tracker id, uses get_current_cgroup_id(). We are,
however, still in the parent so the get_current_cgroup_id() helper will
return the cgroup of the parent, not of the child (which is what we
want).

To simplify the code complexity (both from the perspective of
understandabiliuty but also verification complexity), we instead remove
the tracker id and retrieve the current cgroup id at the time of the
check. This is much simpler, and not less efficient for the common case
(cgroupv2).

Following the same steps with a version build with the current patch,
produces an event as expected.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/remove_cgroupid_tracker branch from 0419ece to 37223bd Compare October 13, 2023 11:25
@kkourt kkourt marked this pull request as ready for review October 13, 2023 12:06
@kkourt kkourt requested a review from a team as a code owner October 13, 2023 12:06
@kkourt kkourt requested a review from olsajiri October 13, 2023 12:06
@jrfastab
Copy link
Contributor

Nice lets go for correctness first and we can optimize later if perf ever is measurable.

@jrfastab jrfastab merged commit 2ebcd9a into main Oct 13, 2023
30 checks passed
@jrfastab jrfastab deleted the pr/kkourt/remove_cgroupid_tracker branch October 13, 2023 17:08
@tixxdz
Copy link
Member

tixxdz commented Oct 14, 2023

btw @jrfastab, @kkourt usually those cases will fork and then migrate the tasks to the appropriate cgroup since they don't fork from pid 1 of container, so to catch those in bpf I planned this https://github.com/cilium/tetragon/blob/pr/tixxdz/cgroup-bpf-full/bpf/cgroup/bpf_cgroup_attach_task.c#L32 but due to lack of time...
There is also the new way of CLONE into new cgroup https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6-rc5&id=ef2c41cf38a7559bbf91af42d5b6a4429db8fc68 , which I'm tracking in a separate clone+namespaces issue...

For this wake_up_new_task() even if we are passing the new task and fetching its cgroup (not the parent) but down the line if the fs uses a cgroupv2 then it switches back to cgroupv2 helpers which only work with current context...

Our own bpf cgroup helpers are far superior since they allows to work on target tasks https://github.com/cilium/tetragon/blob/main/bpf/lib/bpf_cgroup.h#L377 transparently for both cgroupv1 and cgroupv2, I should just have used this instead of upstream bpf helper...

Anyway thanks for fixing this, I plan to retake and finish it.

@kkourt kkourt changed the title bpf: remove cgrpid_tracker bpf: fix policyfilter issue for existing processes Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Tetragon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants