You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the OCI hook feature, an exception for all Pods in the kube-system namespace are added so that Tetragon itself can start as the hook has a dependency on Tetragon.
However, this results in a deadlock scenario if Tetragon is being deployed in any other namespace.
Reproduction
Install Tetragon with the oci-hook feature enabled inside another namespace (i.e. not in kube-system):
The init container starts as expected and configures the oci-hook.
However, this leads to the agent never being able to start as the oci-hook cannot reach the agent and the only exception being Pods in the kube-system namespace:
$ kubectl describe pod -n tetragon tetragon-tctms
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 54s default-scheduler Successfully assigned tetragon/tetragon-tctms to minikube
Normal Pulling 53s kubelet Pulling image "localhost/cilium/tetragon:latest"
Normal Pulled 53s kubelet Successfully pulled image "localhost/cilium/tetragon:latest" in 11ms (11ms including waiting). Image size: 215976744 bytes.
Normal Created 53s kubelet Created container oci-hook-setup
Normal Started 53s kubelet Started container oci-hook-setup
Warning Failed 43s kubelet Error: container create failed: time="2024-04-29T11:36:18Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
Warning Failed 33s kubelet Error: container create failed: time="2024-04-29T11:36:28Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
Normal Pulled 21s (x2 over 53s) kubelet Container image "quay.io/cilium/hubble-export-stdout:v1.0.4" already present on machine
Normal Pulled 11s (x2 over 43s) kubelet Container image "localhost/cilium/tetragon:latest" already present on machine
Warning Failed 11s kubelet Error: container create failed: time="2024-04-29T11:36:50Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
Warning Failed 1s kubelet Error: container create failed: time="2024-04-29T11:37:00Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
Consequence
This leaves the cluster in a bad state as no Pods (other than those in kube-system) are being created, including Tetragon itself.
The only way to restore cluster functionality at this point is by removing the oci-hook configuration on the node.
Tetragon Version
$ tetra version
CLI version: v1.1.0-pre.0-794-gbdcb413f0
Kernel Version
$ uname -a
Linux minikube 6.4.16 #1 SMP Mon Sep 18 21:45:38 UTC 2023 aarch64 Linux
Kubernetes Version
No response
Bugtool
No response
Relevant log output
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
What happened?
Description
When using the OCI hook feature, an exception for all Pods in the
kube-system
namespace are added so that Tetragon itself can start as the hook has a dependency on Tetragon.However, this results in a deadlock scenario if Tetragon is being deployed in any other namespace.
Reproduction
Install Tetragon with the oci-hook feature enabled inside another namespace (i.e. not in
kube-system
):The init container starts as expected and configures the oci-hook.
However, this leads to the agent never being able to start as the oci-hook cannot reach the agent and the only exception being Pods in the
kube-system
namespace:Consequence
This leaves the cluster in a bad state as no Pods (other than those in
kube-system
) are being created, including Tetragon itself.The only way to restore cluster functionality at this point is by removing the oci-hook configuration on the node.
Tetragon Version
Kernel Version
Kubernetes Version
No response
Bugtool
No response
Relevant log output
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: