Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove remaining Cilium dependency from the Tetragon project #794

Closed
sharlns opened this issue Mar 11, 2023 · 31 comments
Closed

Remove remaining Cilium dependency from the Tetragon project #794

sharlns opened this issue Mar 11, 2023 · 31 comments
Assignees

Comments

@sharlns
Copy link
Contributor

sharlns commented Mar 11, 2023

Tetragon can run both with and without Cilium on the same node. Some functionality, however, still depends on the Cilium agent being present. Specifically, Tetragon uses Cilium to retrieve the pod information for destination IPs for pods which are not local to the node. The goal of this project is to introduce this functionality on Tetragon. One approach would be for the Tetragon agent to keep information about all pods in the cluster, but this approach does not scale well due to the k8s API server needing to propagate all pod information to all nodes. Instead, the plan to introduce a new custom resource (CR) which is maintained by the Tetragon operator and provides a mapping from IPs to the small subset of pod information that Tetragon needs. The Tetragon operator will monitor pod information and update the resource as needed. Tetragon agents will watch this CR to provide pod information for destination IPs.

please feel free to contact michi@isovalent.com (github id: @michi-covalent) if you'd like to get some feedback for your draft proposal before the application deadline.

@Lan-ce-lot
Copy link
Contributor

Hi @sharlns , I am astudent from SEL laboratory of Zhejiang University, familiar with cloud native, kubernetes, docker and Go. I'm learning eBPF and I also participated in a eBPF project. Therefore, I think this project is quite suitable for me. I plan to apply for GSOC 2023 and apply for this project. Could you have more suggestions to help me get started?

@xmulligan
Copy link
Member

Hi @Lan-ce-lot Thanks for your interest in the project. If you want to get started working with the project, I would suggest checking out the getting started guide and some of the good first issues. This project will be worked on whoever is selected as the GSoC mentee

@Lan-ce-lot
Copy link
Contributor

Thanks for your advice @xmulligan.

@prateek041
Copy link
Contributor

I came across this issue a few days ago, i am interested in contributing to it. Due to this issue, I started learning about CRDs and operators. As I learn more, I'm understanding the issue and the code better. I am also working on my proposal simultaneously, @xmulligan it would be great help if you could provide some review to my proposal.

@xmulligan
Copy link
Member

Thanks for your interest @prateek041. Unfortunately, we cannot review your proposal beforehand because it would be unfair to the other applicants

@prateek041
Copy link
Contributor

@xmulligan , sorry if I caused any confusion. Actually I read it in the official GSoC mentee guide, that an applicant can submit their proposal as a draft as early as possible, which the mentors can review and provide feedback, can also suggest changes if any. This can be done before the deadline of submission.

Here is the link: https://google.github.io/gsocguides/student/writing-a-proposal#submit-a-proposal-early

@michi-covalent
Copy link
Contributor

thanks for the pointer @prateek041! please feel free to send me your draft if you need some feedback on your proposal before the application deadline. i'll add my contact info in the issue description.

@prateek041
Copy link
Contributor

As this issue was not selected for GSoC 2023, is tetragon planning to participate in LFX June term ?

@kkourt @michi-covalent

@michi-covalent
Copy link
Contributor

hey @prateek041 👋 yeah that is a possibility. i'll discuss this with @kkourt next week and update this ticket 🙏

@prateek041
Copy link
Contributor

That would be great since I was really looking forward to work on it under a mentor. Just a gentle reminder that last date for application is Tue, May 9, 5:00 PM PDT. according to the Official page

@michi-covalent

@michi-covalent
Copy link
Contributor

ok i opened a pull request here cncf/mentoring#957 let's see what happens.

@maheshkasabe
Copy link

Hello @michi-covalent

My name is Mahesh and i'm really interested working on this project under LFX Summer term. since i have been working with kubernetes and go from quite a bit time now i think this project is perfectly suitable for me. I would definitely appreciate if you list out some resources and IRC channel !

Thanks !

@michi-covalent
Copy link
Contributor

hello 👋 thank you all for your interest in this project. the application page is here: https://mentorship.lfx.linuxfoundation.org/project/659fe584-68e6-46bf-bd13-12653ef60268

if you have any questions, either:

apologies we do not have capacity to reply to direct messages / emails 🙏

@Mo-Fatah
Copy link

Mo-Fatah commented May 18, 2023

post a message in tetragon slack channel: https://cilium.slack.com/archives/C03EV7KJPJ9
apologies we do not have capacity to reply to direct messages / emails pray

Thanks for the response, I tried to join the slack channel but it requires an email with @linuxfoundation.org domain. I emailed you yesterday to see if someone has already submitted a proposal for this project, if not then I just wanted to show you the proposal I am working on before submitting it to the LFX website.

@michi-covalent
Copy link
Contributor

michi-covalent commented May 18, 2023

hi @Mo-Fatah 👋

hmm that's strange, it should not require @linuxfoundation.org email to join tetragon slack channel. could you try https://cilium.herokuapp.com/ and see if it works?

there have been multiple proposals to this project. please see cncf/mentoring#937 for the application timeline 📆

@Mo-Fatah
Copy link

it worked, thank you so much 😄

@YashPimple
Copy link

Hello @sharlns and @michi-covalent I am interested in learning about this project and want to work on this project under LFX Mentorship also this issue seems like a great starting point for getting started with a contribution to the cilium. Landed here from the LFX Mentorship projects.

Additionally, I was wondering if there is anything else I can do to get started, such as research and learn about the project from the existing documentation.

@michi-covalent
Copy link
Contributor

hi @YashPimple 👋

to learn more about tetragon, you can start with running through use cases in https://github.com/cilium/tetragon/blob/main/README.md. you can find more comprehensive documentation in https://tetragon.cilium.io/docs/.

@YashPimple
Copy link

Hi, @michi-covalent I will definitely check out the use cases in the GitHub repository and explore the comprehensive documentation on the official Tetragon website. It seems like a great resource to dive deeper into understanding Tetragon. Thank you for your help!

@michi-covalent
Copy link
Contributor

@prateek041 please post your high level plan here in terms of how you are approaching this project 🙏

@prateek041
Copy link
Contributor

Sure @michi-covalent I am writing it. Just finishing it up. thanks for the heads up 😄

@prateek041
Copy link
Contributor

High level overview of the plan

The entire project of building the operator is split into 5 phases

  • Writing the PodInfo CRD
  • Writing the Operator
  • Creating/Updating Helm charts
  • Integrating into Tetragon with a feature flag
  • Performance testing of another approach that involves querying the K8s API everytime PodInfo is needed.

@michi-covalent

@prateek041
Copy link
Contributor

I read more about building operators and learnt new things, I will keeping adding details to the plan as I learn more, I am choosing this approach rather than directly writing the "best approach" solution, for maximizing my Learnings. Here is a little more detail about the implementation.

Create the CRD

the podinfo CRD will contain the information related to the pods that are exclusively necessary for tetragon. This can simply be done by replicating what information cilium endpoints had about pods and storing the exact same information into the PodInfo CRD.

Controller

Controller will use a PodInformer that will have three handlers. Add, Delete and Update.
Whenever any changes into the pod occur, the controller will the run the logic of reflecting the change into the custom resource depending whether Add, Delete or Update handler needs to run.

Question: how will I test if the operator works properly

Integration

Just replace the cilium endpoint with the PodInfo CRD but provide the facility of checking if PodInfo CRD is supposed to run (I will check more about it).

Perf testing the second approach

Create a simple client inject the logic of Fetching pod information, then replace the cilium endpoint with this client, this client will be used by the Tetragon pods to directly fetch the pod information when needed.

Question: How do I create the load here ?

please feel free to give feedbacks.

@michi-covalent

@prateek041
Copy link
Contributor

Here is what I believe Custom Resource Should look like:
api/v1:

type PodInfoSpec struct {
    PodIp       string  `json:"podIP"`
    PodMetaData string  `json:"podInfo"`
}

type PodInfoMapper struct {
    metav1.TypeMeta                `json:",inline"`
    metav1.ObjectMeta              `json:"metadata,omitempty"`
    spec               PodInfoSpec `json:"spec,omitempty"`
}

type PodIPMapperList struct {
    metav1.TypeMeta                  `json:",inline"`
    metav1.ListMeta                  `json:"metadata,omitempty"`
    Items             []PodIPMapper  `json:"items"`
}

Now the controller watches for the events related to pods (CUD), takes the info out from the req object and reflects the changes into spec of PodInfoMapper (with custom logic).

Next task I am looking up to is, setting up the controller to look for pod related events, I believe I need to make some changes in the SetupWithManager function. Kubebuilder book will help out but still Suggestions are welcome.

Question: I am not sure if the status field is necessary here ? if yes, what would it be used for ?

Overall Feedback is very much appreciated.
@michi-covalent

@michi-covalent
Copy link
Contributor

thanks prateek, please go ahead and open a pull request that defines these types. it's easier to get feedback.

Question: I am not sure if the status field is necessary here ? if yes, what would it be used for ?

we don't need the status field for now. it's used to indicate the runtime state of a resource. for example for pods => https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podstatus-v1-core

michi-covalent added a commit that referenced this issue Aug 8, 2023
Move readConfig{Dir,File} to the option package. I'd like to use these
functions to read tetragon operator configurations.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 8, 2023
Move readConfig{Dir,File} to the option package. I'd like to use these
functions to read tetragon operator configurations.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 8, 2023
Move readConfig{Dir,File} to the option package. I'd like to use these
functions to read tetragon operator configurations.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 8, 2023
- Add tetragon-operator-config ConfigMap.
- Add tetragonOperator.skipCRDCreation Helm value.
- Mount the ConfigMap to /etc/tetragon/operator.conf.d/ and load the
  config from the directory.
- Log the config at the startup.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 9, 2023
- Add tetragon-operator-config ConfigMap.
- Add tetragonOperator.skipCRDCreation Helm value.
- Mount the ConfigMap to /etc/tetragon/operator.conf.d/ and load the
  config from the directory.
- Log the config at the startup.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 11, 2023
We'll introduce a proper operator deployment in #794. Once the operator
deployment is available, we can move the CRD registration logic there
instead of calling it from an init container in the Tetragon daemonset.
This commit moves the CRD registration logic to a separate package so
that it can be called from outside the main package.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 11, 2023
We'll introduce a proper operator deployment in #794. Once the operator
deployment is available, we can move the CRD registration logic there
instead of calling it from an init container in the Tetragon daemonset.
This commit moves the CRD registration logic to a separate package so
that it can be called from outside the main package.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 11, 2023
We'll introduce a proper operator deployment in #794. Once the operator
deployment is available, we can move the CRD registration logic there
instead of calling it from an init container in the Tetragon daemonset.
This commit moves the CRD registration logic to a separate package so
that it can be called from outside the main package.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 16, 2023
Deprecate Pod.labels field which contains Cilium identity labels, and
document that it has been replaced by Pod.pod_labels field.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 16, 2023
Deprecate Pod.labels field which contains Cilium identity labels, and
document that it has been replaced by Pod.pod_labels field.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 17, 2023
Deprecate Pod.labels field which contains Cilium identity labels, and
document that it has been replaced by Pod.pod_labels field.

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit that referenced this issue Aug 25, 2023
- Remove ciliumState field from ProcessManager
- Delete GetProcessEndpoint()

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
jrfastab pushed a commit that referenced this issue Aug 25, 2023
- Remove ciliumState field from ProcessManager
- Delete GetProcessEndpoint()

Ref: #794

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
@kkourt
Copy link
Contributor

kkourt commented Nov 3, 2023

@michi-covalent can we close this issue? 🥺

@michi-covalent
Copy link
Contributor

we need to delete these unused packages:

we haven't deleted them yet because there could be downstream projects that depend on these packages.

@sfc-gh-gshe
Copy link
Contributor

I am trying to understand when such PodInfo CRD will be used to "retrieve the pod information for destination IPs for pods which are not local to the node" in the current codebase.

I only saw FindPodInfoByIP function is using it but such function is only called in the test file.

@jrfastab
Copy link
Contributor

ah it appers it never got fully flushed out. Probably for exactly the above question, when/where is the right place to use it. We could/should probably remove the dead code until it has a user. Feel free to push a PR if you want.

@lambdanis
Copy link
Contributor

I used gomod to check how Tetragon depends on cilium/cilium Go packages. Here we go:

gomod graph --style cluster=full -p 'deps(github.com/cilium/tetragon/**, 1) inter rdeps(github.com/cilium/cilium/**, 1)' > tetragon-cilium.dot && dot -Tpng -o tetragon-cilium.png tetragon-cilium.dot

tetragon-cilium

Currently version of k8s libraries in Tetragon is tied to Cilium version. It would be nice to decouple them. Here are Tetragon's transitive dependencies of k8s libraries via Cilium:

gomod graph --style cluster=full -p 'deps(github.com/cilium/tetragon/**, 1) inter rdeps(github.com/cilium/cilium/**, 1) inter (rdeps(k8s.io/**) + rdeps(sigs.k8s.io/**))' > tetragon-cilium-k8s.dot && dot -Tpng -o tetragon-cilium-k8s.png tetragon-cilium-k8s.dot

tetragon-cilium-k8s

To make it clear - these are dependencies in Go code only, not runtime dependencies. We can try to remove some of them in a separate issue, for now I'm just dumping the pictures here.

@lambdanis
Copy link
Contributor

Ok, I opened a separate issue for cleaning up remaining dependencies in Go packages: #2651.

As #2580 is merged, I'm closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests