Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podnetwork: Support CNI plugins like PTP and GKE #1920

Merged
merged 4 commits into from
Aug 3, 2024

Conversation

yoheiueda
Copy link
Member

CNI plugins like PTP and GKE remove a route that is automatically added by kernel for eth0, and then add another route for the same destination.

This patch changes the code to manipulates routes to support such CNI plugins.

Fixes #1909

@yoheiueda yoheiueda requested a review from beraldoleal July 12, 2024 17:23
@yoheiueda
Copy link
Member Author

I tested this PR with Flannel and Libvirt driver.

@yoheiueda yoheiueda force-pushed the cni-ptp branch 2 times, most recently from 3a63f59 to 33ca4a7 Compare July 13, 2024 02:09
@beraldoleal
Copy link
Member

beraldoleal commented Jul 15, 2024

Hi @yoheiueda thanks a lot. I just tested on the GCP provider I'm cooking and it worked fine! While I'm not familiar with this part of the code, tomorrow I will do my best to review it.

Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yoheiueda , this LGTM and as I mentioned before, its working with GKE.

Thank you a lot. Just left two minor comments for you.

Currently, a network interface name in the podns network
namespace in a peer pod VM is hardcoded such as
"vxlan0".

This patch changes to the interface name to the one that
is defined in the original interface name in the network
namespace in a worker node.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
The code that sets up routes in the podns network namespace
in a peer pod VM is duplicated in vxlan and routing tunnelers.

This patch moves the duplicate code into the common code.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to approve this code, but FWIW, looks good to me.

@bpradipt
Copy link
Member

bpradipt commented Jul 17, 2024

@yoheiueda I tried with docker provider on a kind cluster and pod creation fails with

/usr/local/bin/agent-protocol-forwarder: error running a service *forwarder.daemon: failed to set up pod network: failed to set up tunnel "vxlan": failed to add vxlan interface vxlan0: failed to create vxlan interface "vxlan0": /proc/1217/task/1220/ns/net:  file exists
agent-protocol-forwarder.service: Main process exited, code=exited, status=1/FAILURE

In case you want to try here are the images

  • quay.io/bpradipt/cloud-api-adaptor:latest
  • quay.io/bpradipt/podvm-docker-image

The CNI used is calico
The helper scripts to spin up kind cluster and related configs are available here - https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/docker

@EmmEff
Copy link
Contributor

EmmEff commented Jul 17, 2024

Would this resolve the issue usnig the EKS CNI driver as well?

@stevenhorsman stevenhorsman added the test_e2e_libvirt Run Libvirt e2e tests label Jul 17, 2024
This patch introduces protocol and scope attributes
in the netops.Route struct. These attributes are
represented as strings in JSON.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
@yoheiueda
Copy link
Member Author

@bpradipt thank you for testing this PR.

The root cause of the issue is due to this error.

/usr/local/bin/agent-protocol-forwarder: error running a service *forwarder.daemon: failed to set up pod network: failed to remove route 172.16.246.53/32 dev eth0: failed to identify routes to be deleted: dest: 172.16.246.53/32, gw: invalid IP, dev eth0: %!w(<nil>)

Calico sets a Pod IP on an interface with prefix /32. In this case, Linux kernel does not automatically create a route for it.

I added a logic to handle this case, and tested it on docker with Calico as well as libvirt with flannel.

@yoheiueda
Copy link
Member Author

@EmmEff

Would this resolve the issue usnig the EKS CNI driver as well?

What kind of problems did you encounter with EKS CNI?

If this patch does not help for EKS CNI, please create another issue? I'll take a look at it.

@bpradipt
Copy link
Member

@yoheiueda I tested with ovn-kubernetes on OpenShift as well and things looks good

$ RUN_TESTS=TestPodToServiceCommunicationAzure TEST_TRUSTEE_OPERATOR=yes TEST_PROVISION=no TEST_INSTALL_CAA=no make CLOUD_PROVIDER=azure TEST_PROVISION_FILE=$HOME/azure.properties test-e2e
go test -v -tags=azure -timeout 60m -count=1 -run TestPodToServiceCommunicationAzure ./test/e2e
time="2024-07-26T10:46:30+05:30" level=info msg="Do setup"
time="2024-07-26T10:46:30+05:30" level=info msg="Container runtime: crio"
time="2024-07-26T10:46:32+05:30" level=info msg="Creating namespace 'coco-pp-e2e-test-69bcc389'..."
time="2024-07-26T10:46:32+05:30" level=info msg="Wait for namespace 'coco-pp-e2e-test-69bcc389' be ready..."
time="2024-07-26T10:46:38+05:30" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-69bcc389'..."
time="2024-07-26T10:46:38+05:30" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-69bcc389' is ready for use"
=== RUN   TestPodToServiceCommunicationAzure
=== PAUSE TestPodToServiceCommunicationAzure
=== CONT  TestPodToServiceCommunicationAzure
=== RUN   TestPodToServiceCommunicationAzure/TestExtraPods_test
    assessment_runner.go:265: Waiting for containers in pod: nginx are ready
time="2024-07-26T10:48:24+05:30" level=info msg="webserver service is available on cluster IP: 172.30.135.42"
Provision extra pod busybox    assessment_helpers.go:425: Waiting for containers in pod: busybox are ready
=== RUN   TestPodToServiceCommunicationAzure/TestExtraPods_test/Failed_to_test_extra_pod.
time="2024-07-26T10:49:58+05:30" level=info msg="VM found in resource group"
time="2024-07-26T10:50:05+05:30" level=info msg="Success to access nginx service. <!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"
    assessment_runner.go:517: Output when execute test commands:<!DOCTYPE html>
        <html>
        <head>
        <title>Welcome to nginx!</title>
        <style>
        html { color-scheme: light dark; }
        body { width: 35em; margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif; }
        </style>
        </head>
        <body>
        <h1>Welcome to nginx!</h1>
        <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>
        
        <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br/>
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.</p>
        
        <p><em>Thank you for using nginx.</em></p>
        </body>
        </html>
time="2024-07-26T10:50:06+05:30" level=info msg="VM found in resource group"
time="2024-07-26T10:50:06+05:30" level=info msg="Deleting pod nginx..."
time="2024-07-26T10:50:51+05:30" level=info msg="Pod nginx has been successfully deleted within 120s"
time="2024-07-26T10:50:52+05:30" level=info msg="Deleting pod busybox..."
time="2024-07-26T10:51:37+05:30" level=info msg="Pod busybox has been successfully deleted within 120s"
time="2024-07-26T10:51:37+05:30" level=info msg="Deleting Service... nginx"
--- PASS: TestPodToServiceCommunicationAzure (299.50s)
    --- PASS: TestPodToServiceCommunicationAzure/TestExtraPods_test (299.50s)
        --- PASS: TestPodToServiceCommunicationAzure/TestExtraPods_test/Failed_to_test_extra_pod. (11.06s)
PASS
time="2024-07-26T10:51:38+05:30" level=info msg="Deleting namespace 'coco-pp-e2e-test-69bcc389'..."
time="2024-07-26T10:51:48+05:30" level=info msg="Namespace 'coco-pp-e2e-test-69bcc389' has been successfully deleted within 60s"
ok      

$ RUN_TESTS=TestPodsMTLSCommunicationAzure TEST_TRUSTEE_OPERATOR=yes TEST_PROVISION=no TEST_INSTALL_CAA=no make CLOUD_PROVIDER=azure TEST_PROVISION_FILE=$HOME/azure.properties test-e2e 
go test -v -tags=azure -timeout 60m -count=1 -run TestPodsMTLSCommunicationAzure ./test/e2e
time="2024-07-26T10:53:40+05:30" level=info msg="Do setup"
time="2024-07-26T10:53:40+05:30" level=info msg="Container runtime: crio"
time="2024-07-26T10:53:42+05:30" level=info msg="Creating namespace 'coco-pp-e2e-test-633d86dd'..."
time="2024-07-26T10:53:42+05:30" level=info msg="Wait for namespace 'coco-pp-e2e-test-633d86dd' be ready..."
time="2024-07-26T10:53:47+05:30" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-633d86dd'..."
time="2024-07-26T10:53:48+05:30" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-633d86dd' is ready for use"
=== RUN   TestPodsMTLSCommunicationAzure
=== PAUSE TestPodsMTLSCommunicationAzure
=== CONT  TestPodsMTLSCommunicationAzure
=== RUN   TestPodsMTLSCommunicationAzure/TestPodsMTLSCommunication_test
    assessment_runner.go:265: Waiting for containers in pod: nginx are ready
time="2024-07-26T10:55:30+05:30" level=info msg="webserver service is available on cluster IP: 172.30.8.170"
Provision extra pod curl    assessment_helpers.go:425: Waiting for containers in pod: curl are ready
=== RUN   TestPodsMTLSCommunicationAzure/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS
time="2024-07-26T10:56:59+05:30" level=info msg="VM found in resource group"
time="2024-07-26T10:57:07+05:30" level=info msg="Success to access nginx service. <!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"
    assessment_runner.go:517: Output when execute test commands:<!DOCTYPE html>
        <html>
        <head>
        <title>Welcome to nginx!</title>
        <style>
        html { color-scheme: light dark; }
        body { width: 35em; margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif; }
        </style>
        </head>
        <body>
        <h1>Welcome to nginx!</h1>
        <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>
        
        <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br/>
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.</p>
        
        <p><em>Thank you for using nginx.</em></p>
        </body>
        </html>
time="2024-07-26T10:57:07+05:30" level=info msg="VM found in resource group"
time="2024-07-26T10:57:07+05:30" level=info msg="Deleting Configmap... nginx-conf"
time="2024-07-26T10:57:08+05:30" level=info msg="Deleting Secret... server-certs"
time="2024-07-26T10:57:08+05:30" level=info msg="Deleting extra Secret... curl-certs"
time="2024-07-26T10:57:08+05:30" level=info msg="Deleting pod nginx..."
time="2024-07-26T10:57:53+05:30" level=info msg="Pod nginx has been successfully deleted within 120s"
time="2024-07-26T10:57:53+05:30" level=info msg="Deleting pod curl..."
time="2024-07-26T10:58:39+05:30" level=info msg="Pod curl has been successfully deleted within 120s"
time="2024-07-26T10:58:39+05:30" level=info msg="Deleting Service... nginx"
--- PASS: TestPodsMTLSCommunicationAzure (291.26s)
    --- PASS: TestPodsMTLSCommunicationAzure/TestPodsMTLSCommunication_test (291.26s)
        --- PASS: TestPodsMTLSCommunicationAzure/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS (11.16s)
PASS
time="2024-07-26T10:58:39+05:30" level=info msg="Deleting namespace 'coco-pp-e2e-test-633d86dd'..."
time="2024-07-26T10:58:50+05:30" level=info msg="Namespace 'coco-pp-e2e-test-633d86dd' has been successfully deleted within 60s"
ok      github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e     311.573s


$ RUN_TESTS=TestPodToDownloadExternalFileAzure  TEST_TRUSTEE_OPERATOR=yes TEST_PROVISION=no TEST_INSTALL_CAA=no make CLOUD_PROVIDER=azure TEST_PROVISION_FILE=$HOME/azure.properties test-e2e 
go test -v -tags=azure -timeout 60m -count=1 -run TestPodToDownloadExternalFileAzure ./test/e2e
time="2024-07-26T11:10:54+05:30" level=info msg="Do setup"
time="2024-07-26T11:10:54+05:30" level=info msg="Container runtime: crio"
time="2024-07-26T11:10:56+05:30" level=info msg="Creating namespace 'coco-pp-e2e-test-0924345a'..."
time="2024-07-26T11:10:56+05:30" level=info msg="Wait for namespace 'coco-pp-e2e-test-0924345a' be ready..."
time="2024-07-26T11:11:01+05:30" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-0924345a'..."
time="2024-07-26T11:11:02+05:30" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-0924345a' is ready for use"
=== RUN   TestPodToDownloadExternalFileAzure
=== PAUSE TestPodToDownloadExternalFileAzure
=== CONT  TestPodToDownloadExternalFileAzure
=== RUN   TestPodToDownloadExternalFileAzure/PodWithSpecificCommands_test
    assessment_runner.go:265: Waiting for containers in pod: simple-test are ready
=== RUN   TestPodToDownloadExternalFileAzure/PodWithSpecificCommands_test/Pod_with_specific_commands
    assessment_runner.go:416: Output when execute test commands:
time="2024-07-26T11:12:42+05:30" level=info msg="VM found in resource group"
time="2024-07-26T11:12:42+05:30" level=info msg="Deleting pod simple-test..."
time="2024-07-26T11:13:28+05:30" level=info msg="Pod simple-test has been successfully deleted within 120s"
--- PASS: TestPodToDownloadExternalFileAzure (145.80s)
    --- PASS: TestPodToDownloadExternalFileAzure/PodWithSpecificCommands_test (145.80s)
        --- PASS: TestPodToDownloadExternalFileAzure/PodWithSpecificCommands_test/Pod_with_specific_commands (9.50s)
PASS
time="2024-07-26T11:13:28+05:30" level=info msg="Deleting namespace 'coco-pp-e2e-test-0924345a'..."
time="2024-07-26T11:13:38+05:30" level=info msg="Namespace 'coco-pp-e2e-test-0924345a' has been successfully deleted within 60s"
ok      github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e     166.471s

Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@bpradipt bpradipt removed the test_e2e_libvirt Run Libvirt e2e tests label Jul 26, 2024
@bpradipt
Copy link
Member

@stevenhorsman based on the tests carried till now, this PR should be ok to merge.
Would leave it to you to take the final call :-)

@bpradipt
Copy link
Member

@EmmEff

Would this resolve the issue usnig the EKS CNI driver as well?

What kind of problems did you encounter with EKS CNI?

If this patch does not help for EKS CNI, please create another issue? I'll take a look at it.

@EmmEff this patch doesn't solve the external network connectivity issue with default EKS CNI. I tested this on my setup and the external network connectivity problem remains with EKS CNI.

@EmmEff
Copy link
Contributor

EmmEff commented Jul 30, 2024

@EmmEff

Would this resolve the issue usnig the EKS CNI driver as well?

What kind of problems did you encounter with EKS CNI?

If this patch does not help for EKS CNI, please create another issue? I'll take a look at it.

Admittedly I am not entirely sure of the cause of the connectivity issue with the EKS CNI. At a high level, there is no network connectivity from the pod running in the CVM to the outside. Maybe @bpradipt can better explain?

@yoheiueda
Copy link
Member Author

@beraldoleal @EmmEff
I raised an issue for EKS. Let's continue discussion there.
#1966

@bpradipt bpradipt merged commit e18ef16 into confidential-containers:main Aug 3, 2024
28 of 29 checks passed
@yoheiueda yoheiueda deleted the cni-ptp branch September 26, 2024 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

agent-protocol-forwarder failing to bootstrap because of routes conflicting
6 participants