-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodePort Connectivity Issue #231
Comments
Having disabled the rp_filter across all interfaces, such that with log_martians enabled there are no messages in /var/log/messages, the issue still occurs. Without changing the rp_filter settings and enabling log_martians there were messages coming in for the secondary interfaces (which are set to loose by default), but that doesn't seem to be related to this problem. I can confirm that it only seems to occur with pods that have IPs on secondary interfaces and I can easily reproduce this by killing pods until they switch to eth0 where the NodePort will work, then killing them so they appear on another interface for it to fail again. |
Cracked it. So, the problem stems from the fact that the secondary interfaces that are added still have the source/destination check enabled which must result in the ENI dropping the return packets from the pod. This can be proven by disabling the check on the ENI that the pod has an IP allocated on, and connections succeed. I will be submitting a PR to disable the check when ENIs are allocated. |
Worth mentioning that this issue was brought out by setting WARM_ENI_TARGET to 20 as has been advised in numerous places to prevent issues scheduling containers on hosts that have run out of IPs, which meant that all nodes had the maximum number of ENIs attached which increased the chance that the CNI picked an IP that was associated with a secondary interface instead of the primary. |
Have the same problem with 1.3. The first packet hits eth0 (in my case 100.122.192.148) for the node port, gets DNAT'ed to the respective container on eth1 (in my case container ip is 100.122.199.244 which is on eth1). I was trying to figure out why does the reverse of DNAT is not applied before the routing decision, but could not figure this out, probably that's just how linux works. I.e. if the routing rule saw the un-DNATE'ed ip 100.122.192.148, it would get returned back over eth0 how it came from. Will try that PR @nickdgriffin to alter the ENIs, thanks for it :) |
Oh, actually, sounds like it's more of a bug in how amazon-vpc-cni-k8s works with Calico. However, looking at the mangle table, looks like the rules added by #75 are not reached, because calico intercepts them and ACCEPTs before the "CONNMARK restore" rule is triggered. Here's how my rules in mangle table look like:
Looked at the TRACE for the return packet, and it gets terminated at cali-PREROUTING:1, so never reaches AWS rules. |
@ikatson you can change the adapters in the AWS console/via the CLI but you have to do it each time they are replaced - that's all I've done to fix the issue I was having. |
@nickdgriffin yeah, that's how I tried your fix without actually compiling it, just changed ENIs in-place. It does work! However, I think it's probably better to fix the root cause, so that the return packets are routed through the same interface they came in. That specific problem was fixed in #75, but in my setup at least Calico network policy makes that fix not work. I must note btw that I have AWS_VPC_K8S_CNI_EXTERNALSNAT=true. I had problems related to martian packets and rp_filter like you described, when I had AWS_VPC_K8S_CNI_EXTERNALSNAT=false. |
We seemed to be experiencing the same problem (we also have Calico). But I don't understand the workaround and whether it works with Calico (and with the default setting of AWS_VPC_K8S_CNI_EXTERNALSNAT). Can someone summarize? |
@jwalters-gpsw I was able to fix (not even workaround!) the calico issue by setting this value in calico environment variables (need to be set both in node and typha):
The default value for this is "Accept", so by default calico accepts the established packets and they stop traversing the mangle table. Need to file a PR for that. |
I'm having a similar problem here. A TCP ELB (LoadBalancer service) is getting a lot of retransmissions, dupped packets, etc. This causes the health check to fail randomly and actual connections to drop, even established ones. We're running 1.2.1 and we're not using Calico or anything else on the networking stack. The cluster was created with kops. Do you think we're talking about the same issue here? I tried disabling the srcdst check on the secondary ENI, but nothing changed. Let me know if I can collect any data in order to help. Update: Given a node A, with primary private IP address ABC.
I don't think this is right, if the packet came from eth1, it should return through eth1, right? Should I create a new issue? Is it related? |
It sounds similar in terms of behaviour, but if you aren't using Calico for network policies I don't think it can be the same - plus my issue was specifically sorted out by changing the src/dest check, and once it's in a release I'll be testing the fix in #263 out too. |
Not sure whether to open this as a separate issue but the rule added to the routing policy database is subtly wrong, in that it doesn't account for additional bits that might be set in the fwmask - for example by calico. The rule is currently
When it should probably be
Update In fact on closer inspection v1.3.0 works correctly, but master is currently broken. I guess that's what I get for running the bleeding edge but I needed the ENIConfig changes... Current released version - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.3/pkg/networkutils/network.go#L220 Master - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/networkutils/network.go#L244 Happy to submit an MR to fix. |
This helped the issue I was getting through my ELBs, I would get random high response times to my services when using Load Balancer Protocol TCP/SSL (for websockets) as soon as I disabled source/dest check on ALL ENI's the problem went away. I would switch to NLBs but waiting for K8s to support attaching certs to NLBs. |
Hello,
We are experiencing an issue that is functionally identical to #75, using 1.2.1, where certain pods are not accessible from their NodePort on remote hosts and a tcpdump shows a SYN/SYN_ACK at the start followed by TCP retransmissions. As the mentioned ticket is about the rp_filter, here are the values collected by the support bundler:
We have this popping up across different clusters (although they are all identical in terms of setup) after pods are created (in this case Nginx, but it may be happening elsewhere) and the remedy is to delete the pods until the NodePort is functioning correctly.
I can send the support bundle and packet traces by email, and anything else that would help in identifying the cause of this and what we can do about it as it is quite problematic for us.
Thanks,
Nick
The text was updated successfully, but these errors were encountered: