Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/run/xtables.lock created as directory when installed with Helm #2840

Closed
kwohlfahrt opened this issue Mar 12, 2024 · 15 comments · Fixed by #2841
Closed

/run/xtables.lock created as directory when installed with Helm #2840

kwohlfahrt opened this issue Mar 12, 2024 · 15 comments · Fixed by #2841
Assignees
Labels

Comments

@kwohlfahrt
Copy link
Contributor

kwohlfahrt commented Mar 12, 2024

What happened

The file /run/xtables.lock is created as a directory on the host machine. This breaks things that expect it to be a file, including kube-proxy. kube-proxy being unavailable causes the CNI to be unable to reach the API-server, and the node remains stuck in NotReady state.

Attach logs

Events from kube-proxy pod:

Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  4m2s (x60 over 109m)  kubelet  MountVolume.SetUp failed for volume "xtables-lock" : hostPath type check failed: /run/xtables.lock is not a file

What you expected to happen:

I expect /run/xtables.lock to be created as a file.

How to reproduce it (as minimally and precisely as possible):

This is a race condition that occurs fairly rarely, it depends on whether kube-proxy or the AWS CNI daemonset is started first. But it could probably be reproduced by:

  1. Installing the AWS CNI from the Helm chart
  2. Deleting /run/xtables.lock on the host
  3. Restart the AWS CNI DaemonSet pod on the host

Anything else we need to know?:

I'll create a PR with a fix shortly.

Environment:

  • Kubernetes version (use kubectl version): 1.28.7
  • CNI Version: Helm Chart 1.13.0
  • OS (e.g: cat /etc/os-release): ubuntu 22.04
  • Kernel (e.g. uname -a): 5.15.0-1055-aws
@kwohlfahrt kwohlfahrt added the bug label Mar 12, 2024
@kwohlfahrt kwohlfahrt changed the title /run/xtables.lock created as directory /run/xtables.lock created as directory when installed with Helm Mar 12, 2024
@orsenthil
Copy link
Member

orsenthil commented Mar 14, 2024

I expect /run/xtables.lock to be created as a file

What operating system or AMI are you in?

@Preston-PLB
Copy link

I have experienced this issue running the AL2023 1.26, 1.27, and 1.28 EKS optimized AMIs. I worked around this by creating the file in the user-data script when the node boots. Obviously this isn't a great solution.

The kube-proxy manifest includes the FileOrCreate directive when defining the /run/xtables.lock volume. kube-proxy usually is launched before the aws-node pod. However if aws-node launches before kube-proxy, aws-node seems to be creating the /run/xtables.lock as a directory. I have seen some odd behavior as a result of this.

@orsenthil
Copy link
Member

However if aws-node launches before kube-proxy, aws-node seems to be creating the /run/xtables.lock as a directory.

That's interesting. Thanks for the adding this detail.

@Preston-PLB
Copy link

Preston-PLB commented Mar 14, 2024

That's interesting. Thanks for the adding this detail.

I think this may be the actual cause. I have this same behavior without running the helm chart and using the latest 1.16.4 release manifest. I am not getting any error logs in aws-node however the pods that run on the node where aws-node launches before kube-proxy fail with tcp timeouts (unable to connect to resources outside the cluster. SG's and iam all configured properly)

kube version: 1.28
AMI: ami-0f4be968a0e634cd3 - AL2023 eks 1.28
Nodes are being launched via Karpenter.

@kwohlfahrt
Copy link
Contributor Author

kwohlfahrt commented Mar 15, 2024

What operating system or AMI are you in?

We are using a custom AMI, based on Ubuntu 22.04.

However if aws-node launches before kube-proxy, aws-node seems to be creating the /run/xtables.lock as a directory.

Yes, exactly - though it's technically the kubelet (not the aws-node process) that creates the directory, because aws-node specifies it as a volume, without specifying the type. If it does not already exist, it is created as a directory.

@Preston-PLB
Copy link

Yes, exactly - though it's technically the kubelet (not the aws-node process) that creates the directory, because aws-node specifies it as a volume, without specifying the type. If it does not already exist, it is created as a directory.

Absolutely. Poor word choice on my part.

@alam0rt
Copy link

alam0rt commented Mar 19, 2024

Hi, we have seen the same issue on our 22.04 nodes. We are testing out setting FileOrCreate and will let you know if it resolves the problem.

@orsenthil
Copy link
Member

orsenthil commented Mar 19, 2024

We are testing out setting FileOrCreate and will let you know if it resolves the problem.

That will be super helpful.

@Preston-PLB
Copy link

I am not getting any error logs in aws-node however the pods that run on the node where aws-node launches before kube-proxy fail with tcp timeouts (unable to connect to resources outside the cluster. SG's and iam all configured properly)

After updating to the march 7th AMI for EKS optimized AL2023, the network issues have resolved. We are still creating /run/xtables.lock manually in the userdata script of the nodes.

@alam0rt
Copy link

alam0rt commented Mar 24, 2024

We are testing out setting FileOrCreate and will let you know if it resolves the problem.

That will be super helpful.

It seems to have resolved the issue.

@orsenthil orsenthil self-assigned this Apr 2, 2024
Copy link

github-actions bot commented Apr 3, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

@Kyslik
Copy link

Kyslik commented May 4, 2024

That's interesting. Thanks for the adding this detail.

I think this may be the actual cause. I have this same behavior without running the helm chart and using the latest 1.16.4 release manifest. I am not getting any error logs in aws-node however the pods that run on the node where aws-node launches before kube-proxy fail with tcp timeouts (unable to connect to resources outside the cluster. SG's and iam all configured properly)

Hey @Preston-PLB, have you actually reproduced the particular case you described ^^? We've hit something similar but kube-proxy started before aws-node (in correct order); all traffic from the faulty node was timeouting. Including kubelet hitting Kube API, kubelet hitting probes, metrics-server scraping other nodes; everything was timeouting; weirdly workloads could schedule on the node normally. I tried to reproduce according to the OP steps and indeed I got the end result of MountVolume.SetUp failed for volume "xtables-lock". However it's different from what you (and I) described/experienced.

@Preston-PLB
Copy link

Hey @Preston-PLB, have you actually reproduced the particular case you described ^^? We've hit something similar but kube-proxy started before aws-node (in correct order); all traffic from the faulty node was timeouting. Including kubelet hitting Kube API, kubelet hitting probes, metrics-server scraping other nodes; everything was timeouting; weirdly workloads could schedule on the node normally. I tried to reproduce according to the OP steps and indeed I got the end result of MountVolume.SetUp failed for volume "xtables-lock". However it's different from what you (and I) described/experienced.

@Kyslik Yes I have been able to reproduce this. And you are correct it has nothing to do with the launch order of kube-proxy and aws-node. When I was testing, I was running on a March version of the AL2023 EKS AMI. It turns out on some AL2023 AMIs there is a race condition between the deamon provided by the AMI to configure ENIs and aws-node. This is what leads to the network timeouts. I switched to AL2 before I learned this, and everything worked perfectly. I want to try the newer AL2023 AMIs but need the time and space to potentially break my dev environment.

In short, if you are running AL2023 update to the latest version of the AMI. If you can tolerate switching to AL2 that also works. If you are not on AL2023 I would look for any service on the node that attempts to configure ENIs and kill it, and see if that helps

@orsenthil
Copy link
Member

To add a note for future reference - AL2023 uses iptables-nft and doesn't create /run/xtables.lock while AL2's legacy iptables creates /run/xtables.lock file.

@jpriebe
Copy link

jpriebe commented Aug 5, 2024

I'm using a custom AMI built from amazon-eks-node-al2023-x86_64-standard-1.29-v20240729, and this seems to be happening.

The problem seems intermittent (some nodes join the cluster, and some don't), so it's consistent with the idea of a race condition. Updates to my nodegroup were failing with this error:

NodeCreationFailure: Couldn't proceed with upgrade process as new nodes are not joining node group

Looking in logs in /var/log/aws-routed-eni, I saw messages like

pod_workers.go:1298] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" 

and

failed to getAPI group resources: unable to retrieve the complete list of server APIs: networking.k8s.aws/v1alpha1: Get \"https://172.20.0.1:443/apis/networking.k8s.aws/v1alpha1\": dial tcp 172.20.0.1:443: i/o timeout"

kube-proxy logs show this:

Warning  FailedMount  MountVolume.SetUp failed for volume "xtables-lock" : hostPath type check failed: /run/xtables.lock is not a file

I added this to my userdata:

touch /run/xtables.lock

and it seems to have fixed the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants