NodePort Connectivity Issue #231

nickdgriffin · 2018-11-14T18:30:41Z

Hello,

We are experiencing an issue that is functionally identical to #75, using 1.2.1, where certain pods are not accessible from their NodePort on remote hosts and a tcpdump shows a SYN/SYN_ACK at the start followed by TCP retransmissions. As the mentioned ticket is about the rp_filter, here are the values collected by the support bundler:

/proc/sys/net/ipv4/conf/all/rp_filter = 1
/proc/sys/net/ipv4/conf/default/rp_filter = 1
/proc/sys/net/ipv4/conf/eth0/rp_filter = 2

We have this popping up across different clusters (although they are all identical in terms of setup) after pods are created (in this case Nginx, but it may be happening elsewhere) and the remedy is to delete the pods until the NodePort is functioning correctly.

I can send the support bundle and packet traces by email, and anything else that would help in identifying the cause of this and what we can do about it as it is quite problematic for us.

Thanks,
Nick

The text was updated successfully, but these errors were encountered:

nickdgriffin · 2018-11-15T10:24:34Z

Having disabled the rp_filter across all interfaces, such that with log_martians enabled there are no messages in /var/log/messages, the issue still occurs. Without changing the rp_filter settings and enabling log_martians there were messages coming in for the secondary interfaces (which are set to loose by default), but that doesn't seem to be related to this problem.

I can confirm that it only seems to occur with pods that have IPs on secondary interfaces and I can easily reproduce this by killing pods until they switch to eth0 where the NodePort will work, then killing them so they appear on another interface for it to fail again.

nickdgriffin · 2018-11-15T16:04:45Z

Cracked it.

So, the problem stems from the fact that the secondary interfaces that are added still have the source/destination check enabled which must result in the ENI dropping the return packets from the pod. This can be proven by disabling the check on the ENI that the pod has an IP allocated on, and connections succeed.

I will be submitting a PR to disable the check when ENIs are allocated.

nickdgriffin · 2018-11-15T17:52:23Z

Worth mentioning that this issue was brought out by setting WARM_ENI_TARGET to 20 as has been advised in numerous places to prevent issues scheduling containers on hosts that have run out of IPs, which meant that all nodes had the maximum number of ENIs attached which increased the chance that the CNI picked an IP that was associated with a secondary interface instead of the primary.

ikatson · 2018-11-30T13:21:15Z

Have the same problem with 1.3.

The first packet hits eth0 (in my case 100.122.192.148) for the node port, gets DNAT'ed to the respective container on eth1 (in my case container ip is 100.122.199.244 which is on eth1).
Container reply gets routed back over eth1 because of routing rule like "from 100.122.199.244 lookup 2"

I was trying to figure out why does the reverse of DNAT is not applied before the routing decision, but could not figure this out, probably that's just how linux works. I.e. if the routing rule saw the un-DNATE'ed ip 100.122.192.148, it would get returned back over eth0 how it came from.

Will try that PR @nickdgriffin to alter the ENIs, thanks for it :)

ikatson · 2018-11-30T14:00:56Z

Oh, actually, sounds like it's more of a bug in how amazon-vpc-cni-k8s works with Calico.
I noticed, that this specific problem was fixed in #75.

However, looking at the mangle table, looks like the rules added by #75 are not reached, because calico intercepts them and ACCEPTs before the "CONNMARK restore" rule is triggered.

Here's how my rules in mangle table look like:

Chain PREROUTING (policy ACCEPT 76 packets, 5405 bytes)
 pkts bytes target     prot opt in     out     source               destination
 6482   16M cali-PREROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:6gwbT8clXdHdC1b1 */
16982 6853K CONNMARK   all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* AWS, primary ENI */ ADDRTYPE match dst-type LOCAL limit-in CONNMARK or 0x80
 7046   19M CONNMARK   all  --  eni+   *       0.0.0.0/0            0.0.0.0/0            /* AWS, primary ENI */ CONNMARK restore mask 0x80

Chain cali-PREROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination
7921K   21G ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:6BJqBjBC7crtA-7- */ ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:KX7AGNd6rMcDUai6 */ mark match 0x10000/0x10000
 207K   18M ACCEPT     all  --  eni+   *       0.0.0.0/0            0.0.0.0/0            /* cali:CSdpoDToedBYIZRl */
94920 6900K cali-from-host-endpoint  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:grcMAjdqFPVoXgMC */
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:cEp_17JkV0nk977D */ /* Host endpoint policy accepted packet. */ mark match 0x10000/0x10000

Looked at the TRACE for the return packet, and it gets terminated at cali-PREROUTING:1, so never reaches AWS rules.

nickdgriffin · 2018-11-30T14:02:07Z

@ikatson you can change the adapters in the AWS console/via the CLI but you have to do it each time they are replaced - that's all I've done to fix the issue I was having.

ikatson · 2018-11-30T14:07:35Z

@nickdgriffin yeah, that's how I tried your fix without actually compiling it, just changed ENIs in-place. It does work! However, I think it's probably better to fix the root cause, so that the return packets are routed through the same interface they came in. That specific problem was fixed in #75, but in my setup at least Calico network policy makes that fix not work.

I must note btw that I have AWS_VPC_K8S_CNI_EXTERNALSNAT=true. I had problems related to martian packets and rp_filter like you described, when I had AWS_VPC_K8S_CNI_EXTERNALSNAT=false.

jwalters-gpsw · 2018-12-08T00:52:15Z

We seemed to be experiencing the same problem (we also have Calico). But I don't understand the workaround and whether it works with Calico (and with the default setting of AWS_VPC_K8S_CNI_EXTERNALSNAT). Can someone summarize?

ikatson · 2018-12-08T08:27:34Z

@jwalters-gpsw I was able to fix (not even workaround!) the calico issue by setting this value in calico environment variables (need to be set both in node and typha):

- name: FELIX_IPTABLESMANGLEALLOWACTION
  value: Return

The default value for this is "Accept", so by default calico accepts the established packets and they stop traversing the mangle table.

Need to file a PR for that.

greenboxal · 2019-01-09T11:38:24Z

I'm having a similar problem here. A TCP ELB (LoadBalancer service) is getting a lot of retransmissions, dupped packets, etc. This causes the health check to fail randomly and actual connections to drop, even established ones.

We're running 1.2.1 and we're not using Calico or anything else on the networking stack. The cluster was created with kops.

Do you think we're talking about the same issue here? I tried disabling the srcdst check on the secondary ENI, but nothing changed.

Let me know if I can collect any data in order to help.

Update:
I was running some packet captures and found this out:

Given a node A, with primary private IP address ABC.
Given a node B, with primary private IP address DEF.
Given a pod X, with IP address XYZ (associated with a secondary ENI through amazon-vpc-cni-k8s), running on node A.
Given a LoadBalancer service, that points to pod X and similar pods.
Given a NodePort created automatically by K8S so the LoadBalancer works.
Given an ELB, created by K8S, pointing to all instances on the created NodePort.

The ELB tries to reach any instance in the cluster.
The ELB picks node B and sends traffic to the NodePort.
The NodePort is implemented as a DNAT rule, forwards traffic to IP XYZ (pod X, node A).
Node A receives TCP segment on eth1, processes the packet.
Node A tries to send a reply, it returns through eth0 with Src IP ABC.

I don't think this is right, if the packet came from eth1, it should return through eth1, right?

Should I create a new issue? Is it related?

nickdgriffin · 2019-01-14T10:34:06Z

It sounds similar in terms of behaviour, but if you aren't using Calico for network policies I don't think it can be the same - plus my issue was specifically sorted out by changing the src/dest check, and once it's in a release I'll be testing the fix in #263 out too.

tustvold · 2019-03-07T22:30:06Z

Not sure whether to open this as a separate issue but the rule added to the routing policy database is subtly wrong, in that it doesn't account for additional bits that might be set in the fwmask - for example by calico.

The rule is currently

from all fwmark 0x80 lookup main

When it should probably be

from all fwmark 0x80/0x80 lookup main

Update

In fact on closer inspection v1.3.0 works correctly, but master is currently broken. I guess that's what I get for running the bleeding edge but I needed the ENIConfig changes...

Current released version - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.3/pkg/networkutils/network.go#L220

Master - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/networkutils/network.go#L244

Happy to submit an MR to fix.

tylux · 2019-04-04T17:48:06Z

Cracked it.

So, the problem stems from the fact that the secondary interfaces that are added still have the source/destination check enabled which must result in the ENI dropping the return packets from the pod. This can be proven by disabling the check on the ENI that the pod has an IP allocated on, and connections succeed.

I will be submitting a PR to disable the check when ENIs are allocated.

This helped the issue I was getting through my ELBs, I would get random high response times to my services when using Load Balancer Protocol TCP/SSL (for websockets) as soon as I disabled source/dest check on ALL ENI's the problem went away.

I would switch to NLBs but waiting for K8s to support attaching certs to NLBs.

nickdgriffin mentioned this issue Nov 15, 2018

Setting SourceDestCheck to false in AllocENI #233

Closed

ikatson mentioned this issue Nov 30, 2018

NodePort not working properly for pods on secondary ENIs #75

Closed

ikatson mentioned this issue Dec 8, 2018

Fix return path of NodePort traffic when using Calico network policy. #263

Merged

tabern modified the milestone: v1.5 Mar 5, 2019

tabern added the calico Calico integration issue label Mar 5, 2019

mogren closed this as completed Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodePort Connectivity Issue #231

NodePort Connectivity Issue #231

nickdgriffin commented Nov 14, 2018 •

edited

Loading

nickdgriffin commented Nov 15, 2018 •

edited

Loading

nickdgriffin commented Nov 15, 2018

nickdgriffin commented Nov 15, 2018

ikatson commented Nov 30, 2018

ikatson commented Nov 30, 2018 •

edited

Loading

nickdgriffin commented Nov 30, 2018 •

edited

Loading

ikatson commented Nov 30, 2018

jwalters-gpsw commented Dec 8, 2018

ikatson commented Dec 8, 2018

greenboxal commented Jan 9, 2019 •

edited

Loading

nickdgriffin commented Jan 14, 2019

tustvold commented Mar 7, 2019 •

edited

Loading

tylux commented Apr 4, 2019

NodePort Connectivity Issue #231

NodePort Connectivity Issue #231

Comments

nickdgriffin commented Nov 14, 2018 • edited Loading

nickdgriffin commented Nov 15, 2018 • edited Loading

nickdgriffin commented Nov 15, 2018

nickdgriffin commented Nov 15, 2018

ikatson commented Nov 30, 2018

ikatson commented Nov 30, 2018 • edited Loading

nickdgriffin commented Nov 30, 2018 • edited Loading

ikatson commented Nov 30, 2018

jwalters-gpsw commented Dec 8, 2018

ikatson commented Dec 8, 2018

greenboxal commented Jan 9, 2019 • edited Loading

nickdgriffin commented Jan 14, 2019

tustvold commented Mar 7, 2019 • edited Loading

tylux commented Apr 4, 2019

nickdgriffin commented Nov 14, 2018 •

edited

Loading

nickdgriffin commented Nov 15, 2018 •

edited

Loading

ikatson commented Nov 30, 2018 •

edited

Loading

nickdgriffin commented Nov 30, 2018 •

edited

Loading

greenboxal commented Jan 9, 2019 •

edited

Loading

tustvold commented Mar 7, 2019 •

edited

Loading