ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

janeczku · 2021-07-07T12:53:43Z

Expected Behavior

Pod-pod and pod-service communication across nodes should work.

Current Behavior

All traffic between pods across nodes is dropped (with the exception of ICMP).

Possible Solution

VMware recommends to either:

Change the VXLAN port to 8472 (when NSX is not used) or 4789 (when NSX is used)
Disable the VXLAN hardware offload feature on the VMXNET3 NIC (which recent Linux driver version enable by default)

Since a port change is not feasible for Calico Windows (which requires 4789) disabling the hardware offload feature is the only feasible solution. Since this feature was not even supported by earlier Linux versions for that particular NIC device there is no performance impact of disabling it.

Given that the NIC firmware configuration is not something most users are used to manage i suggest to implement a transparent solution in Calico that disables the offload feature when Calico configures VXLAN on host interfaces backed by a VMXNET3 device.
To that effect: It looks like Calico already configures NIC driver settings: https://github.com/projectcalico/felix/blob/master/ethtool/ethtool.go

Steps to Reproduce (for bugs)

Provision VMs on vSphere version 6.7u2 or later using one of the following operating systems: CentOS/RHEL/Oracle 8.3, SLES 15 SP2/SP3
Install Kubernetes cluster on the nodes
Install Calico with VXLAN overlay following official docs, e.g.:

Context

VXLAN packets are dropped on the Linux network stack due to incorrect checksums of inner packets. These incorrect checksums occur when enabling VXLAN hardware offload on the VMXNET3 interface (which recent Linux version do by default) and creating a VXLAN overlay network in the guest OS on ports other than 8472 (when NSX is not used) or 4789 (when NSX is used).

References:

Your Environment

Calico version 3.19.1
Orchestrator version: Kubernetes 1.19.12 (RKE)
Operating System and version: CentOS/RHEL 8.3, SLES 15 SP2

champtar · 2021-07-07T14:20:48Z

VXLAN offload works with many 10G NICs, disabling by default will hurt performance for those, and each card can have different offload toggle, for the qede driver + IPIP you need to disable all offload, not just tx-udp_tnl-csum-segmentation for exemple.

janeczku · 2021-07-07T14:35:02Z

Good point, but the issue at hand is completely limited to vSphere infrastructure, so the fix would/should also only apply to the specific type of NIC used there (VMXNET3). The goal is not to solve all knowns issue in relation to Calico IPIP or VXLAN but to restore compatibility with what is undoubtedly a very mainstream and widespread infrastructure.

lmm · 2021-08-10T16:31:27Z

Thanks @janeczku. So IIUC there is a workaround to disable hardware offloading on those specific NICs that can be done prior to installing Calico for Windows.
Perhaps another way is to document this issue and workaround for Calico vSphere users on https://docs.projectcalico.org

cc @song-jiang

fasaxc · 2021-08-20T12:54:36Z

Is there a good way to detect these NICs? If so, we could arrange for ChecksumOffloadBroken to be set int hat case: https://github.com/projectcalico/felix/blob/master/iptables/feature_detect.go#L116

Note: Calico feature detction can be overridden with config by setting an override in the FelixConfiguration resource:

featureDetectOverride: "ChecksumOffloadBroken=true"

janeczku · 2021-08-20T12:54:43Z

It should either be documented or the workaround should be applied automatically in Felix using the approach described by @fasaxc above.

janeczku · 2021-08-20T12:55:36Z

Yes, they can be detected by determining NIC model and hw revision via ethtool syscalls

janeczku · 2021-08-20T12:57:13Z

The bug is actually in the new linux driver for vmxnet3. So probably instead of detecting the specific hardware revision (which i am not sure is exposed over ethtool) it would be enough to detect that it uses the buggy driver version.

champtar · 2021-08-20T13:30:59Z

Sometimes the bug is with the driver + firmware combination, it's endless.
Best thing would be to have Calico send packets using raw sockets and receive them on another node and see if the checksums are correct, ie really test that it's working.

robodude666 · 2021-12-31T20:36:51Z

@fasaxc, et al.,

I have an issue where pods can't communicate with one another across nodes. I've concluded that it's related to this issue.

I was able to verify that on a brand new k3s cluster install adding featureDetectOverride: "ChecksumOffloadBroken=true" to the FelixConfiguration fixes the issue, but I'm unable to get an existing install fixed by applying the change. What needs to be done for the change to take effect?

I have calico installed via the tigera operator v1.23.1 (calico v3.21.0) on k3s v1.21.5+k3s2. OS is Ubuntu 20.04.

-robodude666

CecileRobertMichon · 2022-08-25T23:55:12Z

I'm hitting this issue on Azure (requires VXLAN) Linux version 5.15.0-1014-azure, using Helm to install Calico in VXLAN mode via operator. Unfortunately, the autodetect doesn't work because my kernel version is > 5.7 (even though Ubuntu 20.04 doesn't appear to have the fix).

However, Calico does not allow configuring Felix directly when using the operator: https://projectcalico.docs.tigera.io/reference/felix/configuration

It would be great if we could either:

Improve the ChecksumOffloadBroken to not rely on a simple kernel version check (since not all distributions have the fix backported) - this would be my preferred solution
Allow configuring Felix via operator / Helm chart values

caseydavenport · 2022-08-26T23:04:39Z

Hm, that's a bummer that the auto-detection isn't working on newer kernels.

If you have installed Calico using the operator, you cannot modify the environment provided to felix directly. To configure felix, see the FelixConfiguration resource instead.

If you're using the operator, you should look at https://projectcalico.docs.tigera.io/reference/resources/felixconfig to use REST API-based configuration instead of environment variables.

You should be able to modify the default FelixConfiguration resource to set:

spec.featureDetectOverride: "ChecksumOffloadBroken=true"

CecileRobertMichon · 2022-08-26T23:16:34Z

You should be able to modify the default FelixConfiguration resource

@caseydavenport that's what I'm doing for now and it seems to make the tests happy: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/1a1fa22e8947ba7805e029a279c85af325c2e32b/templates/addons/calico/felix-override.yaml

Do you know if there is a way to do this directly via the Helm chart though? It'd be easier if I could set the featureDetectOverride in values.yaml instead of having to modify the default FelixConfigurations resource via kubectl apply after the helm install. Maybe I'm missing something?

After doing some research across many GitHub issues on this kernel bug I found https://github.com/rancher/rke2-charts/blob/main-source/packages/rke2-calico/generated-changes/overlay/templates/felixconfig.yaml, seems like rancher folks are doing some sort of overlay to extend the upstream calico template to allow configuring Felix in values.yaml. Would it be valuable to add something like it directly in the official Calico Helm chart?

Thanks so much for the answer and for all your work on the project btw, I've gone through a lot of Calico issues the past few days and your comments were very helpful!

caseydavenport · 2022-08-27T00:24:30Z

Thanks for the pointer to that overlay file! I didn't realize that.

However, this line . . . Looks like #6412 strikes again!

Would it be valuable to add something like it directly in the official Calico Helm chart?

It definitely would, and were it not for the problems discussed in the above issue I'd probably just do that right now. To be honest I'm tempted to do it anyway since the default FelixConfiguration is a singleton and this would be a nice UX improvement and would actually be abstracted behind helm's values.yaml "API" anyway... I will mull on that :)

Thanks so much...

You're very welcome! and I really appreciate the kind words 😸

CecileRobertMichon · 2022-10-12T01:08:29Z

Hey @caseydavenport have you given this any more thought? Looks like others are running into this as well from issue mentions

caseydavenport · 2022-10-17T17:05:32Z

@fasaxc has a PR which will always disable the offload here: #6842

That's probably the best way for now.

fredkan · 2023-09-21T13:39:48Z

Is there a good way to detect these NICs? If so, we could arrange for ChecksumOffloadBroken to be set int hat case: https://github.com/projectcalico/felix/blob/master/iptables/feature_detect.go#L116

Note: Calico feature detction can be overridden with config by setting an override in the FelixConfiguration resource:
featureDetectOverride: "ChecksumOffloadBroken=true"

this only works for VXLAN, not for IPIP;

fasaxc · 2023-09-25T08:57:39Z

@fredkan see above, we decided to disable it by default in more recent versions.

janeczku mentioned this issue Jul 7, 2021

IPIP broken in EL 8.3 with multiple drivers #4384

Closed

lmm added the kind/enhancement label Aug 10, 2021

vadorovsky mentioned this issue Aug 20, 2021

RKE2 Cluster running Calico seemingly losing UDP traffic when transiting through service IP to remotely located pod rancher/rke2#1541

Closed

caseydavenport changed the title ~~Calico VXLAN network broken on VMware vSphere with recent Linux versions~~ ChecksumOffloadBroken autodetection doesn't necessarily detect all cases Jan 10, 2022

DomHoney mentioned this issue Jul 11, 2022

VXLAN: bad UDP checksums kubernetes-sigs/kubespray#8992

Closed

dkoshkin mentioned this issue Sep 13, 2022

DNS issues when using AzureIdentity kubernetes-sigs/cluster-api-provider-azure#1448

Closed

yankay mentioned this issue Sep 15, 2022

Change calico_feature_detect_override default value to ChecksumOffloadBroken=true kubernetes-sigs/kubespray#9261

Closed

fasaxc mentioned this issue Oct 13, 2022

Disable VXLAN checksum offload by default. #6842

Merged

3 tasks

CecileRobertMichon mentioned this issue Oct 14, 2022

Use Helm to install Calico CNI in e2e tests instead of ClusterResourceSets kubernetes-sigs/cluster-api-provider-azure#2495

Merged

3 tasks

fasaxc closed this as completed in #6842 Oct 19, 2022

floryut mentioned this issue Jan 4, 2023

Release Proposal v2.21 kubernetes-sigs/kubespray#9638

Closed

gioppoluca mentioned this issue May 24, 2023

cluster not working against Ubuntu 22.04 cilium/cilium#22234

Closed

2 tasks

lwr20 mentioned this issue Aug 30, 2023

The network speed of the pod cannot reach the speed of the baremetal, only half the speed of the baremetal server #7926

Closed

poblin-orange mentioned this issue Sep 20, 2023

add a property to disable tcp offload (issue with vmware vmxnet3 & ubuntu 22.0.4) orange-cloudfoundry/k3s-boshrelease#144

Closed

adamancini mentioned this issue Mar 8, 2024

update flannel v0.24.2 replicatedhq/kURL#5071

Merged

epelaic mentioned this issue Mar 15, 2024

[BUG] helm-operation failure - Waiting for Kubernetes API to be available rancher/rancher#41296

Open

zagg-bot bot mentioned this issue May 13, 2024

Link Checker Report zaggash/rancherkb-fuzz#2

Open

brandond mentioned this issue Jul 12, 2024

Networking issue between nodes rancher/rke2#6307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

janeczku commented Jul 7, 2021 •

edited

Loading

champtar commented Jul 7, 2021

janeczku commented Jul 7, 2021 •

edited

Loading

lmm commented Aug 10, 2021

fasaxc commented Aug 20, 2021 •

edited

Loading

janeczku commented Aug 20, 2021 •

edited

Loading

janeczku commented Aug 20, 2021

janeczku commented Aug 20, 2021 •

edited

Loading

champtar commented Aug 20, 2021

robodude666 commented Dec 31, 2021

CecileRobertMichon commented Aug 25, 2022

caseydavenport commented Aug 26, 2022

CecileRobertMichon commented Aug 26, 2022 •

edited

Loading

caseydavenport commented Aug 27, 2022

CecileRobertMichon commented Oct 12, 2022

caseydavenport commented Oct 17, 2022

fredkan commented Sep 21, 2023

fasaxc commented Sep 25, 2023

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

ChecksumOffloadBroken autodetection doesn't necessarily detect all cases #4727

Comments

janeczku commented Jul 7, 2021 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

champtar commented Jul 7, 2021

janeczku commented Jul 7, 2021 • edited Loading

lmm commented Aug 10, 2021

fasaxc commented Aug 20, 2021 • edited Loading

janeczku commented Aug 20, 2021 • edited Loading

janeczku commented Aug 20, 2021

janeczku commented Aug 20, 2021 • edited Loading

champtar commented Aug 20, 2021

robodude666 commented Dec 31, 2021

CecileRobertMichon commented Aug 25, 2022

caseydavenport commented Aug 26, 2022

CecileRobertMichon commented Aug 26, 2022 • edited Loading

caseydavenport commented Aug 27, 2022

CecileRobertMichon commented Oct 12, 2022

caseydavenport commented Oct 17, 2022

fredkan commented Sep 21, 2023

fasaxc commented Sep 25, 2023

janeczku commented Jul 7, 2021 •

edited

Loading

janeczku commented Jul 7, 2021 •

edited

Loading

fasaxc commented Aug 20, 2021 •

edited

Loading

janeczku commented Aug 20, 2021 •

edited

Loading

janeczku commented Aug 20, 2021 •

edited

Loading

CecileRobertMichon commented Aug 26, 2022 •

edited

Loading