-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/run/xtables.lock created as directory when installed with Helm #2840
Comments
What operating system or AMI are you in? |
I have experienced this issue running the AL2023 1.26, 1.27, and 1.28 EKS optimized AMIs. I worked around this by creating the file in the user-data script when the node boots. Obviously this isn't a great solution. The kube-proxy manifest includes the |
That's interesting. Thanks for the adding this detail. |
I think this may be the actual cause. I have this same behavior without running the helm chart and using the latest kube version: 1.28 |
We are using a custom AMI, based on Ubuntu 22.04.
Yes, exactly - though it's technically the kubelet (not the |
Absolutely. Poor word choice on my part. |
Hi, we have seen the same issue on our 22.04 nodes. We are testing out setting |
That will be super helpful. |
After updating to the march 7th AMI for EKS optimized AL2023, the network issues have resolved. We are still creating |
It seems to have resolved the issue. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Hey @Preston-PLB, have you actually reproduced the particular case you described ^^? We've hit something similar but |
@Kyslik Yes I have been able to reproduce this. And you are correct it has nothing to do with the launch order of In short, if you are running AL2023 update to the latest version of the AMI. If you can tolerate switching to AL2 that also works. If you are not on AL2023 I would look for any service on the node that attempts to configure ENIs and kill it, and see if that helps |
To add a note for future reference - AL2023 uses iptables-nft and doesn't create |
I'm using a custom AMI built from The problem seems intermittent (some nodes join the cluster, and some don't), so it's consistent with the idea of a race condition. Updates to my nodegroup were failing with this error:
Looking in logs in
and
I added this to my userdata:
and it seems to have fixed the problem. |
What happened
The file
/run/xtables.lock
is created as a directory on the host machine. This breaks things that expect it to be a file, includingkube-proxy
.kube-proxy
being unavailable causes the CNI to be unable to reach the API-server, and the node remains stuck inNotReady
state.Attach logs
Events from
kube-proxy pod
:What you expected to happen:
I expect
/run/xtables.lock
to be created as a file.How to reproduce it (as minimally and precisely as possible):
This is a race condition that occurs fairly rarely, it depends on whether
kube-proxy
or the AWS CNI daemonset is started first. But it could probably be reproduced by:/run/xtables.lock
on the hostAnything else we need to know?:
I'll create a PR with a fix shortly.
Environment:
kubectl version
):1.28.7
1.13.0
cat /etc/os-release
):ubuntu 22.04
uname -a
):5.15.0-1055-aws
The text was updated successfully, but these errors were encountered: