Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multus with VPC-CNI as secondary: failed to add default route: file exists #1649

Closed
cryptk opened this issue Oct 1, 2021 · 14 comments
Closed
Labels

Comments

@cryptk
Copy link

cryptk commented Oct 1, 2021

What happened:
Attempting to use multus with EKS. Primary CNI is Cilium and VPC-CNI as a secondary CNI on a pod. The pod fails to start with errors related to VPC-CNI failing to set the default route because Cilium has already set this route.

Attach logs
Multus Logs:

2021-09-30T23:58:36Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:38Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:40Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:42Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:44Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:46Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:48Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists

VPC-CNI plugin Logs:

{"level":"info","ts":"2021-09-30T23:59:45.130Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received CNI add request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2021-09-30T23:59:45.130Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2021-09-30T23:59:45.133Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received add network response for container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e interface net1: Success:true IPv4Addr:\"10.128.8.86\" DeviceNumber:1 VPCcidrs:\"10.128.8.0/22\" "}
{"level":"debug","ts":"2021-09-30T23:59:45.133Z","caller":"routed-eni-cni-plugin/cni.go:188","msg":"SetupNS: hostVethName=eniacb1d4b899f, contVethName=net1, netnsPath=/proc/909/ns/net, deviceNumber=1, mtu=9001"}
{"level":"error","ts":"2021-09-30T23:59:45.134Z","caller":"driver/driver.go:185","msg":"Failed to setup veth network setup NS network: failed to add default route: file exists"}
{"level":"error","ts":"2021-09-30T23:59:45.135Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Failed SetupPodNetwork for container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists"}
{"level":"info","ts":"2021-09-30T23:59:45.148Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:45.150Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}
{"level":"info","ts":"2021-09-30T23:59:45.279Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:45.281Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}
{"level":"info","ts":"2021-09-30T23:59:46.333Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns() IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:46.336Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}

What you expected to happen:
VPC-CNI should be able to run as the secondary CNI in a Multus configuration, especially with Multus support being advertised as a feature.

How to reproduce it (as minimally and precisely as possible):
Install Cilium, Multus and VPC-CNI. Multus should use the following args:

"--multus-conf-file=auto",
"--cni-version=0.3.1",
"--multus-master-cni-file-name=05-cilium.conflist",
"--multus-log-level=error",
"--multus-log-file=/var/log/aws-routed-eni/multus.log"

Add a NetworkAttachmentDefinition with the following spec:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: vpccni
  namespace: kube-system
spec:
  config: '{
            "cniVersion": "0.3.1",
            "name": "aws-cni",
            "plugins": [
              {
                "name": "aws-cni",
                "type": "aws-cni",
                "vethPrefix": "eni",
                "mtu": "9001",
                "pluginLogFile": "/var/log/aws-routed-eni/plugin.log",
                "pluginLogLevel": "Debug"
              },
              {
                "type": "portmap",
                "capabilities": {"portMappings": true},
                "snat": true
              }
            ]
          }'

This should result in pods running with Cilium as the default CNI and vpccni being available for additional interfaces.
Add the following annotation to a pod: k8s.v1.cni.cncf.io/networks: vpccni

Anything else we need to know?:
The problem appears to lie here: https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.1/cmd/routed-eni-cni-plugin/driver/driver.go#L147
It appears that the VPC-CNI is not capable of handling the case where the default route already exists. In general, a CNI plugin should handle this case.

Environment:
EKS Version: 1.21
Multus version: v3.8
Cilium Version: 1.10.4
VPC-CNI Version: v1.9.1-eksbuild.1 (installed via EKS Addons)

  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.4.141-67.229.amzn2.x86_64
@cryptk cryptk added the bug label Oct 1, 2021
@cgchinmay cgchinmay self-assigned this Oct 1, 2021
@cgchinmay
Copy link
Contributor

Hi @cryptk thanks for reporting, let me try to repro using the steps mentioned and will get back to you.

@cryptk
Copy link
Author

cryptk commented Oct 1, 2021

I was looking for some official guidance on how the default route should be handled when running a multi-net setup, and the official spec addresses this.

https://github.com/k8snetworkplumbingwg/multi-net-spec/tree/master/v1.2

Section 4.1.2.1.9

Typically, it’s assumed that the attachment for the default network will have the default route,
however, in some cases one may desire to specify which attachment will have the default route.
When “default-route” is set for an attachment other than the cluster-wide default network
attachment, it should be noted that the default route and gateway will be cleared from the
cluster-wide default network attachment.

So it looks like the VPC CNI should not replace the original default route, but rather only create it if it does not already exist. That spec had lots of other good information on how a well-behaved CNI should operate in a multi-net configuration.

@cryptk
Copy link
Author

cryptk commented Oct 1, 2021

@cgchinmay thanks for picking the issue up so fast! My repro steps are pretty minimal, so if you need any clarification or expansion, please let me know!

@cgchinmay
Copy link
Contributor

@cryptk There has been exact same issue in the past
#596
#203
and it was fixed by this #367
Will check on why are you hitting same issue with this cni version.

@srini-ram
Copy link
Contributor

@cryptk - Would be great if you can share your use case that drives the requirement for VPC CNI as secondary plugin .

AFAIK, VPC CNI as default delegate CNI was qualified with Multus.

https://docs.aws.amazon.com/eks/latest/userguide/pod-multiple-network-interfaces.html

Only the Amazon VPC CNI plugin is officially supported as the default delegate plugin. You need to modify the published Multus installation manifest to reconfigure the default delegate plugin to an alternate CNI if you choose not to use the Amazon VPC CNI plugin for primary networking.

Doc doesn't explicitly call out that VPC CNI as secondary is not supported. Will confirm internally if this was intended to be supported and respond back.

@jungy-aws
Copy link

+1 interested in the use case for using VPC CNI as for secondary interface, regardless of the issue reported.

@cgchinmay
Copy link
Contributor

Hi @cryptk I was able to repro the issue, like you mentioned. However fixing it will be just 1 part of the problem. The current Multus support expects aws-vpc-cni to be used as the Primary plugin. We will be updating our docs to explicitly call it out.

For now I will mark this as a Feature Request instead. It would help to know your use case for using aws-vpc-cni as the secondary plugin.

@cryptk
Copy link
Author

cryptk commented Oct 19, 2021

@sramabad1 @cgchinmay @jungy-aws sorry for the late response, the GitHub notification seemed to never hit me.

The problem I am trying to solve is that when running Cilium as the CNI (to benefit from eBPF as well as all of the other cilium features) and using the Cilium overlay network, the EKS control plane can no longer talk to any of the pods to handle things like validating and mutating webhooks. A resolution for this would be to place those pods on the VPC network via vpc-cni.

Ideally this would involve having the pods still be primarily on the Cilium overlay network and just having a second interface on the VPC network which can then be used for the EKS control plane communications.

@HenriWilliams
Copy link

+1 for this functionality. We have the same use-case as @cryptk

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Jun 11, 2022
@github-actions github-actions bot removed the stale Issue or PR is stale label Jun 21, 2022
@jayanthvn
Copy link
Contributor

/not stale

@marianobilli
Copy link

@cryptk did you got any progress on this? I have the same use case

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Dec 12, 2022
@github-actions
Copy link

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants