Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RP filter isn't updated to loose when using centos 7 #212

Closed
lnr0626 opened this issue Oct 30, 2018 · 10 comments
Closed

RP filter isn't updated to loose when using centos 7 #212

lnr0626 opened this issue Oct 30, 2018 · 10 comments

Comments

@lnr0626
Copy link

lnr0626 commented Oct 30, 2018

I'm using a centos 7 based AMI for EKS, and created a new cluster to test out the external config that was merged in with #165. The pods I created in this cluster weren't able to communicate with any services within kubernetes (i.e. the kubernetes service in the default namespace, kube-dns, etc.). After doing some investigation, I found that the reverse path filter for the primary interface was still set to strict. After updating this to loose, the cluster worked as expected.

I see there's code to update this to loose when node port support is enabled, however this does not seem to be working as expected.

@ewbankkit
Copy link
Contributor

@lnr0626 I'm hitting the same problem testing the same scenario.
I opened #213 as the aws-cni-support.sh script was reporting the rp_filter value for eth0 rather than the correct primary interface (ens3 in my case).
What did you do to fix the problem?

@liwenwu-amazon I can provide aws-cni-support.tar.gz.

@lnr0626
Copy link
Author

lnr0626 commented Oct 30, 2018

My current fix is to update my userdata script to detect the primary interface and set the rp filter to loose for the primary interface on startup - I pretty much just added sysctl -w "net.ipv4.conf.`route | grep '^default' | grep -o '[^ ]*$'`.rp_filter=2" to the end.

@ewbankkit
Copy link
Contributor

Hmm, that doesn't resolve the problem for me (in fact the value was already 2 in the ens3 file.
I also tried on an m3.large instance running without Enhanced Networking and still get the timeout communicating from pods to the kubernetes ClusterIP.

@perbly
Copy link

perbly commented Nov 1, 2018

Hi.
We are having the same issue on amazon linux (centos 7) and have basically tried everything to get it to work.
The only solution right now is to run debian (which isn´t a solution of course). If someone could explain why that would work i´m a happy listener. /P

@lnr0626
Copy link
Author

lnr0626 commented Nov 1, 2018

@perbly my guess is that the version of debian you used hadn't switched to using predictable network interface names yet and is still using the old method of enumerating the devices, and so the primary interface name was still eth0

@perbly
Copy link

perbly commented Nov 1, 2018

@lnr0626 the interface name is eth0 on both debian and amz linux.

@lnr0626
Copy link
Author

lnr0626 commented Nov 2, 2018

after some further testing, it appears that that doesn't actually fix the issue on centos 7

@ewbankkit
Copy link
Contributor

@lnr0626 What instance type are you using?

@lnr0626
Copy link
Author

lnr0626 commented Nov 3, 2018

@ewbankkit we're using m5.xlarge instances currently

@tabern
Copy link

tabern commented Mar 5, 2019

Fixed by #271

@tabern tabern closed this as completed Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants