Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dual-stack support in meta/portmap #379

Conversation

xcelsion
Copy link
Contributor

@xcelsion xcelsion commented Aug 30, 2019

Fix for issue #378, where when in a dual-stack environment specifying a hostIP would cause an error during container creation.

@xcelsion xcelsion force-pushed the fix-host-container-address-family-mismatch branch from ceab11c to e19c3d9 Compare August 30, 2019 10:45
Copy link
Member

@jellonek jellonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename PR title to something like "Fix dual-stack support in meta/portmap"?

You can also add to commit message "closes #378". Take a look on https://help.github.com/en/articles/closing-issues-using-keywords

plugins/meta/portmap/portmap.go Outdated Show resolved Hide resolved
@xcelsion xcelsion changed the title Fix #378 Fix dual-stack support in meta/portmap Aug 30, 2019
@xcelsion xcelsion force-pushed the fix-host-container-address-family-mismatch branch 3 times, most recently from eab0a5c to e64556a Compare August 30, 2019 11:19
Copy link
Member

@jellonek jellonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mars1024 mars1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the original consideration of this PR, but if we just ignore the error, the users may never know the rules they set are not working, IMO, I think this error should be fatal.
Another suggestion, maybe an admission validation webhook is more proper to do some pre-checks for hostports in dual-stack.

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 2, 2019

It was a fatal error, in the sense that the container failed to start and also left vestigial iptables rules behind on exit.

Given that a user has a choice of populating the HostIP field with either an IPv4 or IPv6 address, it doesn't make sense for portmap to try and map the given HostIP to a ContainerIP of another address family as that is obviously not what the user intended.

Edit:
What we could check for is if a user has provided a HostIP of an address family that the container doesn't support.

@squeed
Copy link
Member

squeed commented Sep 2, 2019

Ah nuts, this was an error in the design. Thanks for catching this.

Clearly we need to support existing users of the HostIP field. I think your fix is correct - for the incorrect-family case, we need to ignore HostIP.

Separately, we should add support for a HostIPs field, then pick the first one for each family. Can you add that? Once the PR is filed, either I or you can file a PR updating the conventions document. We can also update the dockershim and oci-cni drivers in time.

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 3, 2019

Do you wish to still apply the rule when address families mismatch by ignoring the HostIP instead of not applying the rule at all? My idea was to just specify two mapping rules, one for each address family.

Sadly this currently does not work with this pull request applied.

If I create a deployment with the following specification:

kubectl create -f - <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ubuntu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ubuntu
  template:
    metadata:
      labels:
        app: ubuntu
    spec:
      nodeSelector:
        kubernetes.io/hostname: w-02
      containers:
        - name: ubuntu
          image: ubuntu:18.04
          command:
            - tail
            - -f
            - /dev/null
          ports:
            - containerPort: 80
              hostIP: 142.93.228.87
              hostPort: 80
            - containerPort: 80
              hostIP: 2a03:b0c0:2:f0::b1:2001
              hostPort: 80
EOF

The output of iptables -t nat -S on node shows:

-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N CILIUM_OUTPUT_nat
-N CILIUM_POST_nat
-N CILIUM_PRE_nat
-N CNI-DN-99177cda6128d856b47fb
-N CNI-HOSTPORT-DNAT
-A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_nat" -j CILIUM_PRE_nat
-A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
-A OUTPUT -m comment --comment "cilium-feeder: CILIUM_OUTPUT_nat" -j CILIUM_OUTPUT_nat
-A OUTPUT -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
-A POSTROUTING -m comment --comment "cilium-feeder: CILIUM_POST_nat" -j CILIUM_POST_nat
-A CNI-DN-99177cda6128d856b47fb -d 142.93.228.87/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 10.32.3.45:80
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"portmap\" id: \"2ff7a7adea152fdcb63eff53d92eb16685de7155ea16946e3596a1ef58ac1656\"" -m multiport --dports 80 -j CNI-DN-99177cda6128d856b47fb

And the output of ip6tables -t nat -S shows:

-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N CNI-DN-99177cda6128d856b47fb
-N CNI-HOSTPORT-DNAT
-A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
-A OUTPUT -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"portmap\" id: \"2ff7a7adea152fdcb63eff53d92eb16685de7155ea16946e3596a1ef58ac1656\"" -m multiport --dports 80 -j CNI-DN-99177cda6128d856b47fb

Notice the missing dnat rule in the CNI-DN-99177cda6128d856b47fb chain.

It does work correctly when specifying only one rule of either address family. I'll have a look and see if I can fix this later.

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 3, 2019

I've just done a little digging, and it seems as though somewhere upstream the second containerPort: 80 gets filtered out. Below are decoded logs of the portmap inputs for various forwarding configurations.

Ports:

ports:
  - containerPort: 80
    hostIP: 142.93.228.87
    hostPort: 80

Portmap input:

{
   "capabilities":{
      "portMappings":true
   },
   "cniVersion":"0.3.1",
   "name":"portmap",
   "prevResult":{
      "cniVersion":"0.3.1",
      "dns":{

      },
      "interfaces":[
         {
            "mac":"c6:af:a1:b9:4e:80",
            "name":"eth0",
            "sandbox":"/proc//var/run/netns/cni-72b28575-c6e2-621b-2871-a20f03246a1e/ns/net"
         }
      ],
      "ips":[
         {
            "address":"f00d::a57:0:0:a908/128",
            "gateway":"f00d::a57:0:0:43a7",
            "version":"6"
         },
         {
            "address":"10.32.3.175/32",
            "gateway":"10.32.3.110",
            "version":"4"
         }
      ],
      "routes":[
         {
            "dst":"f00d::a57:0:0:43a7/128"
         },
         {
            "dst":"::/0",
            "gw":"f00d::a57:0:0:43a7"
         },
         {
            "dst":"10.32.3.110/32"
         },
         {
            "dst":"0.0.0.0/0",
            "gw":"10.32.3.110"
         }
      ]
   },
   "runtimeConfig":{
      "portMappings":[
         {
            "HostPort":80,
            "ContainerPort":80,
            "Protocol":"tcp",
            "HostIP":"142.93.228.87"
         }
      ]
   },
   "snat":false,
   "type":"portmap"
}

Ports:

ports:
  - containerPort: 80
    hostIP: 2a03:b0c0:2:f0::b1:2001
    hostPort: 80

Portmap input:

{
   "capabilities":{
      "portMappings":true
   },
   "cniVersion":"0.3.1",
   "name":"portmap",
   "prevResult":{
      "cniVersion":"0.3.1",
      "dns":{

      },
      "interfaces":[
         {
            "mac":"d6:4f:f8:4e:dd:f2",
            "name":"eth0",
            "sandbox":"/proc//var/run/netns/cni-da4b577c-8df0-6275-735c-f48227256a3c/ns/net"
         }
      ],
      "ips":[
         {
            "address":"f00d::a57:0:0:1a19/128",
            "gateway":"f00d::a57:0:0:43a7",
            "version":"6"
         },
         {
            "address":"10.32.3.68/32",
            "gateway":"10.32.3.110",
            "version":"4"
         }
      ],
      "routes":[
         {
            "dst":"f00d::a57:0:0:43a7/128"
         },
         {
            "dst":"::/0",
            "gw":"f00d::a57:0:0:43a7"
         },
         {
            "dst":"10.32.3.110/32"
         },
         {
            "dst":"0.0.0.0/0",
            "gw":"10.32.3.110"
         }
      ]
   },
   "runtimeConfig":{
      "portMappings":[
         {
            "HostPort":80,
            "ContainerPort":80,
            "Protocol":"tcp",
            "HostIP":"2a03:b0c0:2:f0::b1:2001"
         }
      ]
   },
   "snat":false,
   "type":"portmap"
}

Ports:

ports:
  - containerPort: 80
    hostIP: 142.93.228.87
    hostPort: 80
  - containerPort: 80
    hostIP: 2a03:b0c0:2:f0::b1:2001
    hostPort: 80

Portmap input:

{
   "capabilities":{
      "portMappings":true
   },
   "cniVersion":"0.3.1",
   "name":"portmap",
   "prevResult":{
      "cniVersion":"0.3.1",
      "dns":{

      },
      "interfaces":[
         {
            "mac":"46:c9:d1:f0:7b:09",
            "name":"eth0",
            "sandbox":"/proc//var/run/netns/cni-94a0d9cb-9b6e-3289-1e73-e1228d05caaf/ns/net"
         }
      ],
      "ips":[
         {
            "address":"f00d::a57:0:0:1562/128",
            "gateway":"f00d::a57:0:0:43a7",
            "version":"6"
         },
         {
            "address":"10.32.3.73/32",
            "gateway":"10.32.3.110",
            "version":"4"
         }
      ],
      "routes":[
         {
            "dst":"f00d::a57:0:0:43a7/128"
         },
         {
            "dst":"::/0",
            "gw":"f00d::a57:0:0:43a7"
         },
         {
            "dst":"10.32.3.110/32"
         },
         {
            "dst":"0.0.0.0/0",
            "gw":"10.32.3.110"
         }
      ]
   },
   "runtimeConfig":{
      "portMappings":[
         {
            "HostPort":80,
            "ContainerPort":80,
            "Protocol":"tcp",
            "HostIP":"142.93.228.87"
         }
      ]
   },
   "snat":false,
   "type":"portmap"
}

Notice the missing IPv6 entry in the last input. I've checked kubectl get pod $PODID -o yaml and there the ports show up as expected in every scenario.

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 3, 2019

I've done some more testing and can confirm the rule is filtered out when there is a duplicate containerPort. Duplicate hostPorts work.

@squeed
Copy link
Member

squeed commented Sep 4, 2019

Ah, good catch - that definitely seems like a dockershim bug. Separately, your change brings this all together: skip the port forwarding rule if the address families don't match.
/lgtm
(we'll discuss this today at the maintainers meeting - should be easy enough)

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 4, 2019

I doubt it's a docker-shim bug as I am using containerd/containerd-shim/runc

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 5, 2019

I've been digging some more and I'm pretty sure I've identified the upstream issue. Take a look at the MakePortMappings function from https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/helpers.go

// MakePortMappings creates internal port mapping from api port mapping.
func MakePortMappings(container *v1.Container) (ports []PortMapping) {
	names := make(map[string]struct{})
	for _, p := range container.Ports {
		pm := PortMapping{
			HostPort:      int(p.HostPort),
			ContainerPort: int(p.ContainerPort),
			Protocol:      p.Protocol,
			HostIP:        p.HostIP,
		}

		// We need to create some default port name if it's not specified, since
		// this is necessary for rkt.
		// http://issue.k8s.io/7710
		if p.Name == "" {
			pm.Name = fmt.Sprintf("%s-%s:%d", container.Name, p.Protocol, p.ContainerPort)
		} else {
			pm.Name = fmt.Sprintf("%s-%s", container.Name, p.Name)
		}

		// Protect against exposing the same protocol-port more than once in a container.
		if _, ok := names[pm.Name]; ok {
			klog.Warningf("Port name conflicted, %q is defined more than once", pm.Name)
			continue
		}
		ports = append(ports, pm)
		names[pm.Name] = struct{}{}
	}
	return
}

As you can see it generates a name for the portMapping entry, based on the container name, protocol, and port number. Given that these are identical between my two mapping entries, a duplicate name is generated. After this, a check is done to ensure the generated name is unique. If a duplicate name is found, a log entry is made and the entry is skipped.

I've run journalctl -u kubelet | grep 'Port name conflicted' on my host and can confirm the log entries are present.

I'll see if I can get a PR going for this at a later date.

@xcelsion
Copy link
Contributor Author

xcelsion commented Sep 5, 2019

Filed an issue and a PR to get the upstream issue fixed.

kubernetes/kubernetes#82373
kubernetes/kubernetes#82374

@xcelsion xcelsion force-pushed the fix-host-container-address-family-mismatch branch from e64556a to 7905ed2 Compare September 6, 2019 10:44
…ontainerIP address family. closes containernetworking#378

Signed-off-by: Niels van Oosterom <xcelsion@users.noreply.github.com>
@xcelsion xcelsion force-pushed the fix-host-container-address-family-mismatch branch from 7905ed2 to e8365e1 Compare September 6, 2019 13:31
@dcbw
Copy link
Member

dcbw commented Sep 11, 2019

/lgtm

@squeed squeed merged commit 4bb2881 into containernetworking:master Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants