Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing cniVersion in CNI plugin configuration #1173

Closed
mgoltzsche opened this issue Aug 30, 2019 · 13 comments · Fixed by #1174
Closed

Missing cniVersion in CNI plugin configuration #1173

mgoltzsche opened this issue Aug 30, 2019 · 13 comments · Fixed by #1174

Comments

@mgoltzsche
Copy link
Contributor

mgoltzsche commented Aug 30, 2019

The CNI plugin configuration in the configmap in ./Documentation/kube-flannel.yml (and others) does not specify its schema version (cniVersion field). However to ensure compatibility the schema version should be specified.

Using Flannel 0.11.0 on Kubernetes 1.15.3 and CRI-O 1.15.0 the latter logged the following error on pod deletion which may have caused more errors:

Error while removing pod from CNI network "cbr0": invalid version "": the version is empty

Expected Behavior

cniVersion should be specified.

Current Behavior

cniVersion is not specified which causes an error in CRI-O.

Possible Solution

Specify cniVersion in flannel CNI plugin configuration.

Steps to Reproduce (for bugs)

  • Prepare a new Kubernetes 1.15.3 cluster using CRI-O 1.15.0
  • Install flannel: kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.11.0/Documentation/kube-flannel.yml
  • Install some pods, delete some pods, reload CRI-O
  • Inspect CRI-O log to see the error message mentioned above: journalctl -u crio -f

Context

I always get this error message in CRI-O after a while when flannel is installed. It didn't cause any obvious problems immediately. However after some time some pods cannot be deleted and the only error message I can find in the logs is the one mentioned above.

Your Environment

  • Flannel version: 0.11.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.3.10
  • Kubernetes version (if used): 1.15.3
  • CRI-O version: 1.15.0
  • Operating System and version: Kubernetes node is running inside a centos:7-based docker container on an ubuntu 18.06 host
  • Link to your project (optional): https://github.com/mgoltzsche/kubernetes-setup/tree/crio

There is a CRI-O issue that may be related.

mgoltzsche added a commit to mgoltzsche/flannel that referenced this issue Aug 30, 2019
The configuration schema version is required to ensure compatibility.
Some implementations (CRI-O) require the field set.

Closes flannel-io#1173.
mgoltzsche added a commit to mgoltzsche/flannel that referenced this issue Aug 30, 2019
The configuration schema version is required to ensure compatibility.
Some implementations (CRI-O) require the field set.

Closes flannel-io#1173.
rodericliu added a commit to rodericliu/kube-flannel-crio-yaml that referenced this issue Sep 16, 2019
To get flannel network working with crio, cniVersion is required in the flannel cni configuration.

      "cniVersion": "0.2.0",

I changed the PR to use CNI version 0.2.0 instead of 0.3.1 to preserve backward-compatibility (at least down to Kubernetes 1.11 that uses CNI 0.6.0 with schema version 0.2.0) since the older CNI schema is still supported by current implementations but not the other way around.

See issue: flannel-io/flannel#1173
@mgoltzsche
Copy link
Contributor Author

I am using this patch (like #1174) for a while now successfully with CRI-O.
(It would be great if this repo would serve a kustomization as well to simplify manifest modifications for users.)

@benmoss
Copy link

benmoss commented Sep 17, 2019

+1 cniVersion is now required in CNI configs running with Kubernetes due to kubernetes/kubernetes#80482

xref coreos/flannel-cni#15

@nodesocket
Copy link

I just ran into this issue today on AWS EKS. Any ideas how I can fix it?

See aws/amazon-vpc-cni-k8s#1412

@mgoltzsche
Copy link
Contributor Author

mgoltzsche commented Mar 26, 2021

The fix from PR #1174 is to add the cniVersion to the CNI configuration file and it should be released like this for some time.

@nodesocket
Copy link

nodesocket commented Mar 26, 2021

@mgoltzsche so does this mean that this is a problem with AWS EKS? I don't manage the master configuration with EKS.

@mgoltzsche
Copy link
Contributor Author

mgoltzsche commented Mar 26, 2021

You can verify that the CNI configuration is correct within your cluster using kubectl get configmap -n kube-system kube-flannel-cfg -o yaml and looking if the cniVersion field is specified. In case it is not specified you have an older flannel version installed which is not supported by the CNI runtime running on your nodes.

I have no experience with EKS. If AWS installed flannel and the cniVersion field is missing but a newer CNI version is used in their default k8s vm images then it would be their mistake but I doubt that EKS would set up a broken cluster for you.
What is more likely:

  • Did you install flannel yourself by any chance? (it is just a kubectl apply command after all; fix: upgrade flannel version)
  • Or did you install a newer CNI runtime on the nodes (custom vm image?) which would not be compatible with the CNI configuration that comes with an older flannel version EKS has installed? (fix: downgrade CNI runtime version on the nodes)

@nodesocket
Copy link

@mgoltzsche thanks so much for the help. Surprisingly no Flannel cfg???

kubectl get configmap -n kube-system kube-flannel-cfg -o yaml
Error from server (NotFound): configmaps "kube-flannel-cfg" not found

@mgoltzsche
Copy link
Contributor Author

mgoltzsche commented Mar 26, 2021

oh, maybe i made a mistake (I don't have a cluster at hand) or it is a custom configuration and/or a different namespace. Maybe you find it by search for all ConfigMaps within your cluster using kubectl get configmap --all-namespaces | grep flannel?

@nodesocket
Copy link

nodesocket commented Mar 26, 2021

No results. Maybe I'm not running Flannel?

kubectl get configmap --all-namespaces | grep flannel

@mgoltzsche
Copy link
Contributor Author

seems so ;)

@nodesocket
Copy link

So, then any idea the reason I am getting:

Warning FailedCreatePodSandBox 3m38s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "bd85f8205cf2b59a5dc0230f82c24aba121487f802b17519528897839b2b8290" network for pod "traefik-6d4b5f9c9f-7sfg7": networkPlugin cni failed to set up pod "traefik-6d4b5f9c9f-7sfg7_default" network: add cmd: failed to assign an IP address to container, failed to clean up sandbox container "bd85f8205cf2b59a5dc0230f82c24aba121487f802b17519528897839b2b8290" network for pod "traefik-6d4b5f9c9f-7sfg7": networkPlugin cni failed to teardown pod "traefik-6d4b5f9c9f-7sfg7_default" network: invalid version "": the version is empty]

@mgoltzsche
Copy link
Contributor Author

mgoltzsche commented Mar 26, 2021

so whatever networking component you're using on EKS provides a CNI configuration file that is not supported anymore by the CNI runtime that is installed on the nodes.
(A networking component writes such a CNI configuration file to the disk on the nodes and the CNI runtime that runs during Pod/container creation/deletion as a OCI runtime hook (triggered by kubelet->CRI) reads it when de/provisioning the network.)
So maybe you used a newer vm image for the k8s nodes than what the EKS master (or rather the network component that comes with it) you're using supports.

@nodesocket
Copy link

nodesocket commented Mar 26, 2021

Updating my Kubernetes worker nodes to the latest AWS EKS version 1.19 AMI and then rebuilding the worker nodes fixed the issue. So, yup gonna assume the problem was hardcoded into the EKS 1.19 AMI I was using previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants