Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete stale UDP conntrack entries when adding new Portmaps to containers #553

Merged
merged 2 commits into from
Nov 25, 2020

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Nov 18, 2020

conntrack does not have any way to track UDP connections, so
it relies on timers to delete a connection.
The problem is that UDP is connectionless, so a client will keep
sending traffic despite the server has gone, thus renewing the
conntrack entries.
Pods that use portmaps to expose UDP services need to flush the existing
conntrack entries on the port exposed when they are created,
otherwise conntrack will keep sending the traffic to the previous IP
until the connection age (the client stops sending traffic)

Fixes: #123

Signed-off-by: Antonio Ojea <aojea@redhat.com>
@aojea aojea force-pushed the conntrack branch 3 times, most recently from 13407be to 3ae998e Compare November 18, 2020 10:24
@aojea
Copy link
Contributor Author

aojea commented Nov 18, 2020

WIP it needs a test

@aojea
Copy link
Contributor Author

aojea commented Nov 18, 2020

/assign @squeed

@aojea aojea force-pushed the conntrack branch 2 times, most recently from 93e3f23 to 2346d32 Compare November 18, 2020 16:28
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the error return is modified

pkg/utils/conntrack.go Outdated Show resolved Hide resolved
conntrack does not have any way to track UDP connections, so
it relies on timers to delete a connection.
The problem is that UDP is connectionless, so a client will keep
sending traffic despite the server has gone, thus renewing the
conntrack entries.
Pods that use portmaps to expose UDP services need to flush the existing
conntrack entries on the port exposed when they are created,
otherwise conntrack will keep sending the traffic to the previous IP
until the connection age (the client stops sending traffic)

Signed-off-by: Antonio Ojea <aojea@redhat.com>
@mrueg
Copy link

mrueg commented Dec 2, 2020

@squeed would it be possible to get a new minor/patch release including this change?

@aojea
Copy link
Contributor Author

aojea commented Dec 2, 2020

I think we should wait until we have have confirmation that it solves the original issue kubernetes/kubernetes#95258 , just for double checking

@AnishShah
Copy link

I was able to test it using the following yaml that I used to reproduce the error originally and it works.

apiVersion: v1
kind: Namespace
metadata:
  name: udp
---
apiVersion: v1
kind: Pod 
metadata:
  name: udp-server
  namespace: udp
spec:
    containers:
      - name: udp-server
        image: aardvarkx1/udp-server
        imagePullPolicy: Always
        ports:
          - containerPort: 10001
            protocol: UDP
            hostPort: 10001
            name: udp-test
---
apiVersion: v1
kind: Pod
metadata:
  name: udp-client
  namespace: udp
spec:
    containers:
      - name: udp-client
        image: aardvarkx1/udp-client
        imagePullPolicy: Always
        env:
          - name: SERVER_ADDRESS
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP

@AnishShah
Copy link

AnishShah commented Dec 3, 2020

Can we cut a minor release now that the kubernetes/kubernetes#95258 issue is closed?

@aojea
Copy link
Contributor Author

aojea commented Dec 3, 2020

@bboreham @squeed what do you think? it will be nice to have a new release

v0.8.7...master

@DanTulovsky
Copy link

DanTulovsky commented Mar 18, 2021

I just hit this issue on GKE (version: 1.19.8-gke.1000). How do I check if this fix made it there or not?

My calico node has:

Image: gke.gcr.io/calico/cni:v3.8.8-1-gke.2-amd64


CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "mtu": 1460, "log_level": "warning", "datastore_type": "kubernetes", "nodename": "__KUBERNETES_NODE_NAME__", "ipam": { "type": "host-local", "subnet": "usePodCidr" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "__KUBECONFIG_FILEPATH__" } }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true }, { "type": "bandwidth", "capabilities": {"bandwidth": true} } ] }
KUBERNETES_NODE_NAME: fieldRef(v1:spec.nodeName)
SLEEP: false

Thank you
Dan

@aojea
Copy link
Contributor Author

aojea commented Mar 18, 2021

Your cniplugin version has to be 0.9.0 or newer

@DanTulovsky
Copy link

Yes, thank you. Given this is all installed by GKE automatically, do you know where I could see what the plugin version being used is?

@AnishShah
Copy link

Check in /home/kubernetes/bin

@DanTulovsky
Copy link

What should I be looking for in there?

gke-cluster0-pool0-6e5903aa-2vmq /home/kubernetes/bin # ls -al
total 431208
drwxr-xr-x  2 root    root       4096 Mar 18 13:37 .
drwxr-xr-x 11 root    root       4096 Mar 18 13:37 ..
-rwxr-xr-x  1 root    root    4559393 Mar 18 13:37 bandwidth
-rwxr-xr-x  1 root    root    4671350 Feb 18  2020 bridge
-rwxr-xr-x  1 root    root   34485216 Mar 18 13:37 calico
-rwxr-xr-x  1 root    root   33682272 Mar 18 13:37 calico-ipam
-r-xr--r--  1 root    root     123073 Feb 23 12:22 configure-helper.sh
-rwxr-xr-x  1 root    root      26051 Feb 23 12:22 configure-kubeapiserver.sh
-rwxr-xr-x  1 root    root      29988 Mar 18 13:37 configure.sh
-rwxr-xr-x  1 chronos users  30473598 Aug 28  2020 crictl
-rwxr-xr-x  1 root    root   12120140 Feb 18  2020 dhcp
-rwxr-xr-x  1 root    root    5945760 Feb 18  2020 firewall
-rwxr-xr-x  1 root    root    3069034 Mar 18 13:37 flannel
-rwxr-xr-x  1 root    root    4629262 Feb 18  2020 gke
-rwxr-xr-x  1 root    root   35364941 Feb 23 12:22 gke-exec-auth-plugin
-rwxr-xr-x  1 root    root     627321 Feb 23 12:22 gke-exec-auth-plugin-license
-rwxr-xr-x  1 root    root      22000 Feb 23 12:22 gke-internal-configure-helper.sh
-rwxr-xr-x  1 root    root       1762 Feb 23 12:22 gke-internal-configure.sh
-rwxr-xr-x  1  344930 89939   7508876 Nov 19 00:27 health-checker
-r-xr--r--  1 root    root       4140 Feb 23 12:22 health-monitor.sh
-rwxr-xr-x  1 root    root    4153025 Feb 18  2020 host-device
-rwxr-xr-x  1 root    root    3957620 Mar 18 13:37 host-local
-rwxr-xr-x  1 root    root    4314508 Feb 18  2020 ipvlan
-rwxr-xr-x  1 root    root   42176512 Feb 23 09:24 kubectl
-rwxr-xr-x  1 root    root  115725640 Feb 23 09:24 kubelet
-rwxr-xr-x  1  344930 89939  18257656 Nov 19 00:27 log-counter
-rwxr-xr-x  1 root    root    3650379 Mar 18 13:37 loopback
-rwxr-xr-x  1 root    root    4389532 Feb 18  2020 macvlan
-rwxr-xr-x  1  344930 89939  44245360 Nov 19 00:27 node-problem-detector
-rwxr-xr-x  1 root    root    4327403 Mar 18 13:37 portmap
-rwxr-xr-x  1 root    root    4590104 Feb 18  2020 ptp
-rwxr-xr-x  1 root    root    3392736 Feb 18  2020 sbr
-rwxr-xr-x  1 root    root    2885430 Feb 18  2020 static
-rwxr-xr-x  1 root    root    3736919 Mar 18 13:37 tuning
-rwxr-xr-x  1 root    root    4314356 Feb 18  2020 vlan

The only somewhat relevant thing I see is:

gke-internal-configure-helper.sh: "cniVersion": "0.3.1",

Is that the right thing to look at? That seems like a very old version.

@bboreham
Copy link
Contributor

If you just run the binary /home/kubernetes/bin/portmap it should print its version to stdout.

@DanTulovsky
Copy link

DanTulovsky commented Mar 19, 2021 via email

@DanTulovsky
Copy link

I see kubernetes/kubernetes#95258 (comment) ... thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

portmap: delete UDP conntrack entries on teardown
6 participants