Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2 installation sets fs.file-max to 65535 #7632

Closed
zchenyu opened this issue Jan 24, 2025 · 3 comments
Closed

RKE2 installation sets fs.file-max to 65535 #7632

zchenyu opened this issue Jan 24, 2025 · 3 comments

Comments

@zchenyu
Copy link

zchenyu commented Jan 24, 2025

Environmental Info:
RKE2 Version: v1.30.5+rke2r1

Node(s) CPU architecture, OS, and Version:

Linux 64-181-207-149 6.8.0-49-generic #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
N/A

Describe the bug:

After installing RKE2 (see below), fs.file-max gets set to 65k, which is much too low, especially for control plane nodes.

$ cat /proc/sys/fs/file-max 
9223372036854775807

$ curl -sfL https://get.rke2.io | sh -
$ systemctl enable rke2-agent.service
$ systemctl start rke2-agent.service

$ cat /proc/sys/fs/file-max 
65535

Steps To Reproduce:

  • Installed RKE2:
$ cat /proc/sys/fs/file-max 
9223372036854775807

$ curl -sfL https://get.rke2.io | sh -
$ systemctl enable rke2-agent.service
$ systemctl start rke2-agent.service

$ cat /proc/sys/fs/file-max 
65535
$ cat /etc/rancher/rke2/config.yaml
tls-san:
  - XXX
node-label:
  - XXX
server: XXX
token: XXX

Expected behavior:
fs.file-max to stay the same (2^64)

Actual behavior:
fs.file-max gets reduced to 2^16

Additional context / logs:

@brandond
Copy link
Member

brandond commented Jan 24, 2025

much too low, especially for control plane nodes.

Can you explain what specific problem this is causing for you? We have yet to receive any other complaints of this.

I believe there must be a privileged pod (probably one of the CNI pods, or perhaps niginx) that is doing this, as there is nothing in RKE2, and nothing I am aware of in core Kubernetes, that sets this at a system level.

For example, RKE2's systemd unit sets a higher limit than what you're reporting:

@zchenyu
Copy link
Author

zchenyu commented Jan 24, 2025

After the node runs for a while, it the node basically becomes unusable. e.g.

  • Cannot SSH into the node
  • If I luckily do SSH, I can't do anything on the file system (e.g. vim, cat, ls)
  • Lot's of k8s issues pop up, e.g. Node becomes NotReady, pods start failing, etc

@brandond
Copy link
Member

We've certainly never run into that. What makes you think that this is related to the max file descriptors? What are the resources (CPU/memory/disk) of the nodes in question?

@rancher rancher locked and limited conversation to collaborators Jan 24, 2025
@brandond brandond converted this issue into discussion #7633 Jan 24, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants