Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod mapping via cgroup ids #2776

Merged
merged 34 commits into from
Aug 21, 2024
Merged

pod mapping via cgroup ids #2776

merged 34 commits into from
Aug 21, 2024

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Aug 7, 2024

This PR implements pod association with cgroup ids instead of cgroup names.

It achieves this by:

  • maintaining a mapping from cgroup ids to container ids (cgidmap)
  • using runtime-hooks to initialize the mapping before the container starts
  • using the k8s API watcher to update the state of the pods and containers
  • talking to the CRI to update missing mappings (this is useful for pods started before tetragon)

The immediate concern that this PR tries to address is fixing pod association on recent1 versions of crun (an OCI container used by cri-o) not working. See #2639.

In addition to that, it offers a more reliable way to associate pods since it is based on authoritative information provided by runtime hooks and the CRI.

Followups:

add support for pod association via cgroup id

Footnotes

  1. https://github.com/containers/crun/blob/main/NEWS#L565-L567

@kkourt kkourt added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Aug 7, 2024
Copy link

netlify bot commented Aug 7, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 0a450d6
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66c336e68d2e73000891216a
😎 Deploy Preview https://deploy-preview-2776--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch 6 times, most recently from 29bb91a to 937ed26 Compare August 7, 2024 12:51
@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch 4 times, most recently from ef46923 to 0152233 Compare August 9, 2024 14:06
@kkourt kkourt changed the title [WIP] pod mapping via cgroup ids pod mapping via cgroup ids Aug 9, 2024
@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch from 0152233 to 8f031aa Compare August 9, 2024 14:18
@kkourt kkourt marked this pull request as ready for review August 9, 2024 14:45
@kkourt kkourt requested review from a team, mtardy and willfindlay as code owners August 9, 2024 14:45
@mtardy mtardy linked an issue Aug 12, 2024 that may be closed by this pull request
Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a nice improvement. I tried my best to review it.

On the nit side, I spotted many places where you use ID or Id, maybe that would be could to unify and use Golang convention ID. I saw some typos in the comments but not sure you want to spend the time fixing those in all those commits and didn't want to be annoying pointing them since we totally get the point anyway.

pkg/server/server.go Show resolved Hide resolved
pkg/rthooks/rthooks.go Outdated Show resolved Hide resolved
.github/workflows/tetragon-rthook-pr.yaml Outdated Show resolved Hide resolved
pkg/rthooks/args_test.go Outdated Show resolved Hide resolved
pkg/rthooks/args.go Outdated Show resolved Hide resolved
cmd/tetra/cri/cri.go Show resolved Hide resolved
vendor/github.com/tidwall/gjson/logo.png Outdated Show resolved Hide resolved
pkg/cgidmap/cri.go Outdated Show resolved Hide resolved
pkg/cgidmap/cri.go Show resolved Hide resolved
pkg/api/processapi/processapi.go Show resolved Hide resolved
@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch from 8f031aa to 3268367 Compare August 15, 2024 09:20
@kkourt
Copy link
Contributor Author

kkourt commented Aug 15, 2024

Thanks for the review @mtardy. Pushed a new version.

@kkourt kkourt requested a review from mtardy August 15, 2024 09:20
@jrfastab
Copy link
Contributor

Generic comment it would be best to keep patch max patch count around 10 or so this is getting rather large. Just for future.

kkourt added 5 commits August 19, 2024 14:02
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Return an error if a callback in RuntimeHook fails. This might lead to a
container failing, but this is the desired behavior for many use-cases.
For other use-cases, the hook (client) can be contributed accordingly so
that it does not fail allowing the runtime to create the container.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
containerNameFromAnnotations uses specific annotations for containerd
and cri-o to get the container name.  These annotations are used for the
NRI internal implementation for containerd and cri-o so add some code
references as comments.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add containerID, podName, podID, and podNamespace to CreateContainer
message.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
kkourt added 6 commits August 19, 2024 14:02
Add a function for the container id from the argument of the
CreateContainer grpc call. Improvements will follow. This function is
going to be used by a new hook that will be added in subsequent patches.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
crio paths add a prefix to the container path. Remove it when we return
the container id. Add a test.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Previous patches added a ContainerID field to the CreateContainerArg gRPC
request. Use it in ContainerID() if it exists.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add a function for getting the pod from the argument of the
CreateContainer grpc call. This function is going to be used by a new
hook that will be added in subsequent patches.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Cgroup id and Pod require some non-trivial steps to compute. Subsequent
patches will have an additional hook that uses these methods. Cache the
results so that we only use them once.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Move some pod helpers code from policyfilter into a different package so
that they can be used in subsequent patches.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch from 3268367 to 0a450d6 Compare August 19, 2024 12:13
Copy link
Contributor

@willfindlay willfindlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. It's a little scary that we haven't tested this on cgroupv1 but I think it makes sense to mark as beta with a note in the documentation until we get the proper testing infra in place.

pkg/rthooks/rthooks.go Outdated Show resolved Hide resolved
Copy link
Member

@tixxdz tixxdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review as I didn't have time to get all of it.

pkg/cgroups/cgroups.go Show resolved Hide resolved
pkg/cgroups/cgroups.go Outdated Show resolved Hide resolved
pkg/cgroups/cgroups.go Outdated Show resolved Hide resolved
pkg/rthooks/args.go Show resolved Hide resolved
@tixxdz
Copy link
Member

tixxdz commented Aug 19, 2024

Kornilios what you mean by "talking to the CRI to update missing mappings (this is useful for pods started before tetragon)" ? and curious why we can't just do that from /proc?

@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch from 0a450d6 to eb2ae48 Compare August 20, 2024 09:17
@kkourt
Copy link
Contributor Author

kkourt commented Aug 20, 2024

Kornilios what you mean by "talking to the CRI to update missing mappings (this is useful for pods started before tetragon)" ? and curious why we can't just do that from /proc?

I mean that we are talking to the CRI (cri-o or containerd) socket to get information for existing containers. This information includes the cgroup path of the container, which we use to find the cgroup id. I think it should be possible to scan the filesystem and find the cgroup paths without talking to the CRI (indeed the policyfilter code does this), but the CRI is the source of truth for this and using it seems like the best approach to me.

kkourt added 13 commits August 20, 2024 13:51
Add a function to get the cgroup id from a PID. This is meant to be used
in /proc scanning, to retrieve the cgroup id of a process. The function
was tested for cgroup v2, but not for v1.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Create a new function findPod for the basic functionality.
No functional changes here, just moving code around to make the next
patch clearer.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Static pods are started and managed directly by kubelet and are mirrored
to the API server. Hence, the pod UID in the node is different
than the pod UID in the API server.

As a result, if one of the static pods is restarted while the hook is
running, we cannot find the corresponding pod in the API server by the
id. Instead, we need to look it up using the "kubernetes.io/config.hash"
and "kubernetes.io/config.mirror" annoations.

Add a FindMirrorPod function in the watcher and use it to find the
mirror pod UID for static pods in rthooks.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
policyfilter included code for enabling debugging only for policyfilter
code. Move this code in pkg/logger so that it can be used in other
packages as well.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Pod association is attaching pod information to events.

Previously, pod association happened using the cgroup name under the
assumption that each container is executed directly on a cgroup with a
name that contains the container id of the k8s pod. Hence, by having
events (bpf and proc) include the cgroup name, it was possible to
retrieve the container id and from that the pod.

This approach, however, has some issues. One example is that crun, an
OCI runner used by cri-o, may (and by default will) execute containers
in a cgroup where the name does include the container id.

For example:
kubepods-besteffort.slice/kubepods-besteffort-pod42b7841b_bb27_49e2_81bc_3d3a372dd231.slice/crio-6727930f5d4e9b799d58732af6882826591eb476f680b6499d8919d1ec515eee.scope/container/

In this case, the pod association scheme based on the cgroup name does
not work.

To address the above and other issues, this patch introduces cgidmap,
which allows doing pod association by cgroup id (rather than cgroup name).

The general idea is that:
 - events  (bpf and proc) will include the cgroup id, which can be used
   to query the map for the container id. Once we have the container id,
   pod association can proceed as previously.

 - mappings of cgroup ids to container ids will be added by the runtime
   hooks, before a container starts.

 - changes in pods (pod removal, container removal) will be handled via
   the K8s API server pod watcher.

One issue left for subsequent patches is dealing with cases where the
runtime hook is not running, which is required for pod association for
pods started before tetragon.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add CRI client code. This will be used in subsequent patches.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Talking to the CRI from tetra is useful for development and
troubleshooting. Add a simple version command for now. More will follow.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
We use https://github.com/tidwall/gjson to get the field, and, also, add
some code to parse the cgroup path if it's in a weird systemd format.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Previous commit added code to fetch the cgroup path from the CRI.
This commit adds a correpsonding command to the tetra cli.

Example:
 ./tetra cri cgroup_path 17edc91323fec
 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfe54825188b04dea25220ce286cc3e2f.slice/crio-17edc91323fec4e5eda212c4c5c6f7535dbf7ea8e88f2909c8938f51f4eb48a8.scope

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds code to query the CRI to resolve cgroup ids when the
value is missing.

We use a simple bounded LIFO and a goroutine that performs the
connection to the CRI, decodes the result, and updates the cgidmap.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This commit adds functionality (disabled by default) to resolve Pods
with cgroup ids, instead of cgroup names.

The motivation is to have more robust pod association and deal with
idiosyncrasies of certain runtimes (specifically, crun; See next patch).

To enable this feature users need to set:
 --enable-cgidmap
 --enable-cri

It is also highly recommended to use runtime hooks so that the cgroup id
mappings are set before the container starts.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
crun, an OCI container runtime used by cri-o breaks pod association for
tetragon by using placing processes in a cgroup below the cgroup specified by
the OCI spec:
https://github.com/containers/crun/blob/main/crun.1.md#runocisystemdsubgroupsubgroup.

With the introduction of cgidmap, this commit can finally deal with this issue
by scanning the cgroup directory for children directories and, if it finds one,
use the cgroup id of the child.

A better solution would be to allow for multiple cgroup ids for each container,
but this is left as a followup.

The commit includes a script for testing this issue using minikube. Becaues
minikube uses an older version of crun, we need to install it.

The steps for reproducing this are:
   minikube start --driver=kvm2 --container-runtime=crio --force-systemd=true
   ./scripts/minikube-install-crun.sh

Running tetragon without cgidmap, we observe events without pod association:
  🚀 process minikube /usr/bin/ls
  💥 exit    minikube /usr/bin/ls  0

By installing the runtime hooks:
 ./scripts/minikube-install-hook.sh

And runing tetragon with cgidmap (and nri) using --enable-cri --enable-cgidmap,
we observe pod association for both old and new pods:

🚀 process default/test /usr/bin/ls
💥 exit    default/test /usr/bin/ls  0

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Both findMirrorPod and findPod functions do the same retry loop.
Use a helper retry function instead.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/cgid-pod-mapping branch from eb2ae48 to ebd29de Compare August 20, 2024 11:51
@kkourt
Copy link
Contributor Author

kkourt commented Aug 21, 2024

Merging this. Left a couple of followups in the PR description based on the comments. Thanks everyone!

@kkourt kkourt merged commit f284f72 into main Aug 21, 2024
48 checks passed
@kkourt kkourt deleted the pr/kkourt/cgid-pod-mapping branch August 21, 2024 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Tetragon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cri-o 1.29 Tetragon not show namespace/pod information
6 participants