-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup tracking #3170
cgroup tracking #3170
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
2194168
to
7c5baa3
Compare
Checkpatch failure cannot be fixed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good, it's a large PR though! Cgroup tracking looks cool as well! Thanks I have a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a bit fan of the name, could you use something like tester-shell
or something a bit more explicit? Also not sure to understand why the code is in pkg/testutils/
and not here, to run the tests without changing everything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will rename it to test-hellper
. I was also not very happy with "tester"
Also not sure to understand why the code is in pkg/testutils/ and not here, to run the tests without changing everything else?
It's easier to develop both the test and the code that uses it in the same file. For example, you can add both the commands and the helper for the tests that use it.
} | ||
|
||
func OpenMap(fname string) (Map, error) { | ||
m, err := ebpf.LoadPinnedMap(fname, &ebpf.LoadPinOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to close this map at some point or should it just die with tetragon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's on the caller to close the map. I don't see how we could close it from OpenMap
.
We should be closing it for the non-global uses, but for global we keep it open until tetragon dies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep yep agree, just pointed this line to talk about it
@@ -103,6 +103,7 @@ type config struct { | |||
|
|||
EnableCgIDmap bool | |||
EnableCgIDmapDebug bool | |||
EnableCgTrackerID bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming is a bit off because you have ID
but Cg
, but you have other names with this also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand 🤔
@@ -78,7 +78,7 @@ func cgfsMkdirTemp(t *testing.T, cgfsPath string) string { | |||
// create tempdir | |||
dir, err := os.MkdirTemp(cgfsPath, "cgtracker-test-*") | |||
if err != nil { | |||
t.Fatalf("failed to create test dir: %s", err) | |||
t.Skipf("skipping test, failed to create test dir: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the decision to skip, maybe what you would need for this case is to create a dir in a subsystem directory, like memory for v1 to make it work nop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, would need to investigate more.
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
GetTestSensorManager requires the observer, which means that if we try to use pkg/testutils/sensors from something that the observer imports, we will get a import cycle. Move GetTestSensorManager to a new package (pkg/testutils/observer) to avoid this. This is needed for a susequent commit. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Future change will introduce an import cycle: package github.com/cilium/tetragon/pkg/cgtracker imports github.com/cilium/tetragon/pkg/sensors imports github.com/cilium/tetragon/pkg/policyfilter imports github.com/cilium/tetragon/pkg/process imports github.com/cilium/tetragon/pkg/cgidmap imports github.com/cilium/tetragon/pkg/cgtracker: import cycle not allowed in test Fix it by moving GetWorkloadMetaFromPod from process to podhelpers package. After this change, policyfilter does need not to import process, which breaks the cycle. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
The method is to be used in a subsequent patch. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a simple tester program. The idea is that tester acts as a simple shell that we can use to execute commands without having to exec again. This allows us, for example, to set add tester in a cgroup that we want and having everything executed by it being on the same cgroup. Tests in subsequent commits will use it. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a utility function to copy an exec file to temp. This allows us to do reliable binary filtering. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
7c5baa3
to
b3070ec
Compare
Thanks! Pushed a new version. |
Pod association with cgidmap is based on the cgroup id. One limitation of the current implementation is that if a cgroup inside the container is created, processes running in this nested cgroup will not be properly associated with the container. To address this, this commit introduces the cgtracker map that maps these nested cgroup ids (tracked ids) to the container cgroup id (tracker id). There is already older some BPF code (see bpf/cgroup and bpf/lib/bpf_cgroup.h) that is aimed at a similar idea, but it is only used in tests. The idea behind the old code was to use the cgroup level as a way to identify which cgroups to track. The current code follows a simpler approach and relies on user-space to setup the initial values of the map (e.g., from rthooks). This will be added in subsequent commits. The plan is to migrate the tests to the new approach and remove this old code, but this is left as a followup. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds userspace code for cgtracker and a test. The test loads the cgtracker program and maps and checks that the programs operate as expected. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds a cgtracker command to the CLI aimed for troubleshooting. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Update the cgtracker from userspace in two paths from the cgidmap code: i) in a runtime-hook, when we learn of a new container, and ii) in the cri-resolver that is a result of an unmapped id from a pod event. For case (i), cgtracker is updated before the container starts running and whatever cgroups are created once the container is created, should be tracked by the BPF hooks. Case (ii), is somewhat trickier because a cgroup creation from the container may race AddCgroupTrackerPath() and we might miss cgroups. We leave making this more robust as a followup. Some potential solutions might be to traverse more than one level towards the root in the BPF hook. Another is to periodically query the CRI and scan the cgroup paths for IDs we are missing. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a configuraiton option to enable cgroup tracker id. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit modifies the exec event to include a new field called cgroup tracker id (cgrp_tracker_id in bpf and CgrpTrackerID in go). The intention of this field is to track children cgroups. Specifically, in a container environment the cgroup of the container will be used to track all children cgroups within this container. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds a test for cgtracker and the execve events which tests that if we crate a cgroup child the tracker id is what we expect. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
When running with cri-enabled, the criResolver goroutine needs to call cgtracker. The problem is that the cgtracker map is created when we the base sensor is loaded so it is not available when the criResolver first starts. This commit adds a retry loop to allow the base sensor to be loaded so that it becomes available for the criResolver goroutine. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This is to avoid import cycles, because the cgtracker test imports from the exec sensor. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Commit 0227f5d introduced pod association using cgidmap. This commit extends functionality to also support nested cgroups via cgtracker. Cgtracker allows pod association to work even if containers create their own cgroup hiearchy. Commit 0227f5d changed the GetPodInfo function to pass the cgroup id and map the cgroup id to a container id inside this function. This, however, will not work for the cgroup tracker. In the existing code, GetPodInfo() is only executed if the cgroup name we get from the bpf side matches the container name. This is fine for non-nested cgroups, but it will not work for nested cgroups because their name might be anything. We could remove the check, but this would mean that we would need to retry to resolve pod information for every non-container process in the system. Instead, this commit moves the association earlier, in msgToExecveUnix. If we find a container id, then process code will attempt pod association via GetPodInfo. Using runtime-hooks should ensure that we have the necessary information (cgroup id -> container id) mapping before the container starts. For other cases, one possible solution would be to implement retry, but we would need to have a way to distinguish which processes we need to retry (e.g., by marking processes in bpf with cgroups under a certain hierarchy). Furthermore, when using cgtracker, there is no need to use GetCgroupIDFromSubCgroup introduced in b1e8c44. In fact, if we use it there will be an error because we need the actual container id because this is what is being tracked. Hence, this commit also changes criResolve accordingly. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Remove stale comments since previous commits implenented cgroup tracking. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
In GH runners, we get: --- FAIL: TestCgTrackerMap (0.12s) cgtracker_test.go:73: '/sys/fs/cgroup' is cgroup v1 cgtracker_test.go:81: failed to create test dir: mkdir /sys/fs/cgroup/cgtracker-test-1715271346: read-only file system So it seems that we cannot execute the test there. Skip it for now. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
b3070ec
to
5e924fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now a comment, next week I give it a review, it seems it will break in cgroupv1
cgid = get_cgroup_id(cgrp); | ||
if (cgid == 0) | ||
return 0; | ||
cgid_parent = cgroup_get_parent_id(cgrp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm didn't have time to review all, but want to point this is not cgroupv1 compatible, the mkdir and release delete will work on any cgoupv1 hierarchies, and if kernfs ids clash could lead to corruption?
The old branch has this check: https://github.com/cilium/tetragon/blob/pr/tixxdz/cgroup-bpf-full/bpf/cgroup/bpf_cgroup_mkdir.c#L33
So this could break in cgroupv1. Will check closely later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have this only for group v2 at the moment. That being said, it's unclear what the problem is for v1. What are the kernfs IDs that could clash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are following all hierarchies in cgroupv1! including the ones that we don't track, why?
The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are following all hierarchies in cgroupv1! including the ones that we don't track, why?
We can add a filter to tack only the configured hierarchy for v1. My first goal was to support cgroup v2 so that's what the first implementation does. Any cgroup v1 support might or might not work since we do not have tests for it.
The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.
Not sure I understand. Are you saying that cgroup ids are unique only within a specific hierarchy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are following all hierarchies in cgroupv1! including the ones that we don't track, why?
We can add a filter to tack only the configured hierarchy for v1. My first goal was to support cgroup v2 so that's what the first implementation does. Any cgroup v1 support might or might not work since we do not have tests for it.
Exactly my point and with that linked bpf cgroup helper, it transparently handle it.
The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.
Not sure I understand. Are you saying that cgroup ids are unique only within a specific hierarchy?
Yes cgroupv1 are separate cgroup mounts backed by kernfs, each kernfs node part of a mount has its unique ID that is the inode number. The allocation is predictable using IDR last time I checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand correctly, are you saying that it's possible for two cgroup nodes of different hierarchies to have the same kernfs id?
Thanks, merging this. I'm happy to address additional feedback as a followup. |
I would at least specifically state this is only cgroupv2 compatible, since cgroupv1 was not tested. |
See commits