Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tetragon with ociHookSetup.enabled cannot start in a namespace other than kube-system #2402

Closed
f1ko opened this issue May 2, 2024 · 0 comments · Fixed by #2404
Closed

Tetragon with ociHookSetup.enabled cannot start in a namespace other than kube-system #2402

f1ko opened this issue May 2, 2024 · 0 comments · Fixed by #2404
Labels
kind/bug Something isn't working

Comments

@f1ko
Copy link
Contributor

f1ko commented May 2, 2024

What happened?

Description

When using the OCI hook feature, an exception for all Pods in the kube-system namespace are added so that Tetragon itself can start as the hook has a dependency on Tetragon.
However, this results in a deadlock scenario if Tetragon is being deployed in any other namespace.


Reproduction

Install Tetragon with the oci-hook feature enabled inside another namespace (i.e. not in kube-system):

kubectl create ns tetragon
helm install --namespace tetragon \
        --set tetragonOperator.image.override=localhost/cilium/tetragon-operator:latest \
        --set tetragon.image.override=localhost/cilium/tetragon:latest  \
        --set tetragon.grpc.address="unix:///var/run/cilium/tetragon/tetragon.sock" \
        --set tetragon.ociHookSetup.enabled=true \
        tetragon ./install/kubernetes/tetragon

The init container starts as expected and configures the oci-hook.
However, this leads to the agent never being able to start as the oci-hook cannot reach the agent and the only exception being Pods in the kube-system namespace:

$ kubectl describe pod -n tetragon tetragon-tctms
[...]
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  54s                default-scheduler  Successfully assigned tetragon/tetragon-tctms to minikube
  Normal   Pulling    53s                kubelet            Pulling image "localhost/cilium/tetragon:latest"
  Normal   Pulled     53s                kubelet            Successfully pulled image "localhost/cilium/tetragon:latest" in 11ms (11ms including waiting). Image size: 215976744 bytes.
  Normal   Created    53s                kubelet            Created container oci-hook-setup
  Normal   Started    53s                kubelet            Started container oci-hook-setup
  Warning  Failed     43s                kubelet            Error: container create failed: time="2024-04-29T11:36:18Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Warning  Failed     33s                kubelet            Error: container create failed: time="2024-04-29T11:36:28Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Normal   Pulled     21s (x2 over 53s)  kubelet            Container image "quay.io/cilium/hubble-export-stdout:v1.0.4" already present on machine
  Normal   Pulled     11s (x2 over 43s)  kubelet            Container image "localhost/cilium/tetragon:latest" already present on machine
  Warning  Failed     11s                kubelet            Error: container create failed: time="2024-04-29T11:36:50Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Warning  Failed     1s                 kubelet            Error: container create failed: time="2024-04-29T11:37:00Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "

Consequence

This leaves the cluster in a bad state as no Pods (other than those in kube-system) are being created, including Tetragon itself.
The only way to restore cluster functionality at this point is by removing the oci-hook configuration on the node.

Tetragon Version

$ tetra version
CLI version: v1.1.0-pre.0-794-gbdcb413f0

Kernel Version

$ uname -a
Linux minikube 6.4.16 #1 SMP Mon Sep 18 21:45:38 UTC 2023 aarch64 Linux

Kubernetes Version

No response

Bugtool

No response

Relevant log output

No response

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant