Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup tracking #3170

Merged
merged 20 commits into from
Dec 2, 2024
Merged

cgroup tracking #3170

merged 20 commits into from
Dec 2, 2024

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Nov 27, 2024

See commits

tetragon: add support for associating pod information when nested cgroups are used

@kkourt kkourt changed the title Pr/kkourt/cgroup tracking cgroup tracking Nov 27, 2024
@kkourt kkourt added the release-note/major This PR introduces major new functionality label Nov 27, 2024
Copy link

netlify bot commented Nov 27, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit b3070ec
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/6749e8ff0080ff0008557cca
😎 Deploy Preview https://deploy-preview-3170--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kkourt kkourt force-pushed the pr/kkourt/cgroup-tracking branch 8 times, most recently from 2194168 to 7c5baa3 Compare November 27, 2024 18:02
@kkourt
Copy link
Contributor Author

kkourt commented Nov 27, 2024

Checkpatch failure cannot be fixed:

CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
  #136: FILE: bpf/lib/bpf_helpers.h:111:
  +#define offsetof(type, member) ((unsigned long)&((type *)0)->member)

@kkourt kkourt marked this pull request as ready for review November 27, 2024 19:14
@kkourt kkourt requested review from mtardy and a team as code owners November 27, 2024 19:14
Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, it's a large PR though! Cgroup tracking looks cool as well! Thanks I have a few comments

pkg/testutils/progs/tester.go Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a bit fan of the name, could you use something like tester-shell or something a bit more explicit? Also not sure to understand why the code is in pkg/testutils/ and not here, to run the tests without changing everything else?

Copy link
Contributor Author

@kkourt kkourt Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will rename it to test-hellper. I was also not very happy with "tester"

Also not sure to understand why the code is in pkg/testutils/ and not here, to run the tests without changing everything else?

It's easier to develop both the test and the code that uses it in the same file. For example, you can add both the commands and the helper for the tests that use it.

bpf/cgroup/bpf_cgtracker.c Outdated Show resolved Hide resolved
bpf/cgroup/bpf_cgtracker.c Outdated Show resolved Hide resolved
}

func OpenMap(fname string) (Map, error) {
m, err := ebpf.LoadPinnedMap(fname, &ebpf.LoadPinOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to close this map at some point or should it just die with tetragon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's on the caller to close the map. I don't see how we could close it from OpenMap.
We should be closing it for the non-global uses, but for global we keep it open until tetragon dies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep yep agree, just pointed this line to talk about it

pkg/cgidarg/cgidarg.go Outdated Show resolved Hide resolved
pkg/cgidmap/cgidmap.go Outdated Show resolved Hide resolved
pkg/cgidmap/rthooks.go Outdated Show resolved Hide resolved
@@ -103,6 +103,7 @@ type config struct {

EnableCgIDmap bool
EnableCgIDmapDebug bool
EnableCgTrackerID bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming is a bit off because you have ID but Cg, but you have other names with this also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand 🤔

@@ -78,7 +78,7 @@ func cgfsMkdirTemp(t *testing.T, cgfsPath string) string {
// create tempdir
dir, err := os.MkdirTemp(cgfsPath, "cgtracker-test-*")
if err != nil {
t.Fatalf("failed to create test dir: %s", err)
t.Skipf("skipping test, failed to create test dir: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the decision to skip, maybe what you would need for this case is to create a dir in a subsystem directory, like memory for v1 to make it work nop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, would need to investigate more.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
GetTestSensorManager requires the observer, which means that if we try
to use pkg/testutils/sensors from something that the observer imports, we
will get a import cycle.

Move GetTestSensorManager to a new package (pkg/testutils/observer) to
avoid this. This is needed for a susequent commit.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Future change will introduce an import cycle:

package github.com/cilium/tetragon/pkg/cgtracker
        imports github.com/cilium/tetragon/pkg/sensors
        imports github.com/cilium/tetragon/pkg/policyfilter
        imports github.com/cilium/tetragon/pkg/process
        imports github.com/cilium/tetragon/pkg/cgidmap
        imports github.com/cilium/tetragon/pkg/cgtracker: import cycle not allowed in test

Fix it by moving GetWorkloadMetaFromPod from process to podhelpers
package. After this change, policyfilter does need not to import
process, which breaks the cycle.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
The method is to be used in a subsequent patch.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a simple tester program. The idea is that tester acts as a simple
shell that we can use to execute commands without having to exec again.
This allows us, for example, to set add tester in a cgroup that we want
and having everything executed by it being on the same cgroup.
Tests in subsequent commits will use it.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a utility function to copy an exec file to temp.
This allows us to do reliable binary filtering.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/cgroup-tracking branch from 7c5baa3 to b3070ec Compare November 29, 2024 16:17
@kkourt
Copy link
Contributor Author

kkourt commented Nov 29, 2024

Overall it looks good, it's a large PR though! Cgroup tracking looks cool as well! Thanks I have a few comments

Thanks! Pushed a new version.

Pod association with cgidmap is based on the cgroup id. One limitation
of the current implementation is that if a cgroup inside the container
is created, processes running in this nested cgroup will not be properly
associated with the container.

To address this, this commit introduces the cgtracker map that maps
these nested cgroup ids (tracked ids) to the container cgroup id
(tracker id).

There is already older some BPF code (see bpf/cgroup and
bpf/lib/bpf_cgroup.h) that is aimed at a similar idea, but it is only
used in tests. The idea behind the old code was to use the cgroup level
as a way to identify which cgroups to track. The current code follows a
simpler approach and relies on user-space to setup the initial values of
the map (e.g., from rthooks). This will be added in subsequent commits.
The plan is to migrate the tests to the new approach and remove this old
code, but this is left as a followup.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds userspace code for cgtracker and a test.

The test loads the cgtracker program and maps and checks that the
programs operate as expected.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds a cgtracker command to the CLI aimed for
troubleshooting.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Update the cgtracker from userspace in two paths from the cgidmap code:
i) in a runtime-hook, when we learn of a new container, and ii) in the
cri-resolver that is a result of an unmapped id from a pod event.

For case (i), cgtracker is updated before the container starts running
and whatever cgroups are created once the container is created, should
be tracked by the BPF hooks.

Case (ii), is somewhat trickier because a cgroup creation from the
container may race AddCgroupTrackerPath() and we might miss cgroups. We
leave making this more robust as a followup. Some potential solutions
might be to traverse more than one level towards the root in the BPF
hook. Another is to periodically query the CRI and scan the cgroup paths
for IDs we are missing.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a configuraiton option to enable cgroup tracker id.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit modifies the exec event to include a new field called cgroup
tracker id (cgrp_tracker_id in bpf and CgrpTrackerID in go).

The intention of this field is to track children cgroups. Specifically,
in a container environment the cgroup of the container will be used to
track all children cgroups within this container.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds a test for cgtracker and the execve events which tests
that if we crate a cgroup child the tracker id is what we expect.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
When running with cri-enabled, the criResolver goroutine needs to call
cgtracker. The problem is that the cgtracker map is created when we the
base sensor is loaded so it is not available when the criResolver first
starts.

This commit adds a retry loop to allow the base sensor to be loaded so
that it becomes available for the criResolver goroutine.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This is to avoid import cycles, because the cgtracker test imports from
the exec sensor.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Commit 0227f5d introduced pod
association using cgidmap. This commit extends functionality to also
support nested cgroups via cgtracker.

Cgtracker allows pod association to work even if containers create
their own cgroup hiearchy.

Commit 0227f5d changed the GetPodInfo
function to pass the cgroup id and map the cgroup id to a container id
inside this function.

This, however, will not work for the cgroup tracker. In the existing
code, GetPodInfo() is only executed if the cgroup name we get from the
bpf side matches the container name. This is fine for non-nested
cgroups, but it will not work for nested cgroups because their name
might be anything. We could remove the check, but this would mean that
we would need to retry to resolve pod information for every
non-container process in the system.

Instead, this commit moves the association earlier, in msgToExecveUnix.
If we find a container id, then process code will attempt pod
association via GetPodInfo. Using runtime-hooks should ensure that we
have the necessary information (cgroup id -> container id) mapping
before the container starts. For other cases, one possible solution
would be to implement retry, but we would need to have a way to
distinguish which processes we need to retry (e.g., by marking
processes in bpf with cgroups under a certain hierarchy).

Furthermore, when using cgtracker, there is no need to use
GetCgroupIDFromSubCgroup introduced in
b1e8c44. In fact, if we use it there
will be an error because we need the actual container id because this is
what is being tracked. Hence, this commit also changes criResolve
accordingly.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Remove stale comments since previous commits implenented cgroup
tracking.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
In GH runners, we get:

--- FAIL: TestCgTrackerMap (0.12s)
    cgtracker_test.go:73: '/sys/fs/cgroup' is cgroup v1
    cgtracker_test.go:81: failed to create test dir: mkdir /sys/fs/cgroup/cgtracker-test-1715271346: read-only file system

So it seems that we cannot execute the test there. Skip it for now.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/cgroup-tracking branch from b3070ec to 5e924fb Compare November 29, 2024 16:20
Copy link
Member

@tixxdz tixxdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now a comment, next week I give it a review, it seems it will break in cgroupv1

cgid = get_cgroup_id(cgrp);
if (cgid == 0)
return 0;
cgid_parent = cgroup_get_parent_id(cgrp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm didn't have time to review all, but want to point this is not cgroupv1 compatible, the mkdir and release delete will work on any cgoupv1 hierarchies, and if kernfs ids clash could lead to corruption?

The old branch has this check: https://github.com/cilium/tetragon/blob/pr/tixxdz/cgroup-bpf-full/bpf/cgroup/bpf_cgroup_mkdir.c#L33

So this could break in cgroupv1. Will check closely later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have this only for group v2 at the moment. That being said, it's unclear what the problem is for v1. What are the kernfs IDs that could clash?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are following all hierarchies in cgroupv1! including the ones that we don't track, why?

The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are following all hierarchies in cgroupv1! including the ones that we don't track, why?

We can add a filter to tack only the configured hierarchy for v1. My first goal was to support cgroup v2 so that's what the first implementation does. Any cgroup v1 support might or might not work since we do not have tests for it.

The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.

Not sure I understand. Are you saying that cgroup ids are unique only within a specific hierarchy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are following all hierarchies in cgroupv1! including the ones that we don't track, why?

We can add a filter to tack only the configured hierarchy for v1. My first goal was to support cgroup v2 so that's what the first implementation does. Any cgroup v1 support might or might not work since we do not have tests for it.

Exactly my point and with that linked bpf cgroup helper, it transparently handle it.

The kernfs IDs are cgroup IDs if they are re-used you remove them from the tracking and it won't work anymore, see your cgrp release bpf program.

Not sure I understand. Are you saying that cgroup ids are unique only within a specific hierarchy?

Yes cgroupv1 are separate cgroup mounts backed by kernfs, each kernfs node part of a mount has its unique ID that is the inode number. The allocation is predictable using IDR last time I checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand correctly, are you saying that it's possible for two cgroup nodes of different hierarchies to have the same kernfs id?

@kkourt
Copy link
Contributor Author

kkourt commented Dec 2, 2024

Thanks, merging this. I'm happy to address additional feedback as a followup.

@kkourt kkourt merged commit 882a020 into main Dec 2, 2024
44 of 45 checks passed
@kkourt kkourt deleted the pr/kkourt/cgroup-tracking branch December 2, 2024 09:20
@tixxdz
Copy link
Member

tixxdz commented Dec 2, 2024

Thanks, merging this. I'm happy to address additional feedback as a followup.

I would at least specifically state this is only cgroupv2 compatible, since cgroupv1 was not tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants