-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max pids is set to 32768 #737
Comments
I don't have any concerns with increasing this value, because the current default is mostly arbitrary. However, we don't want to mask issues such as DataDog/datadog-agent#12997, so we shouldn't increase it too dramatically. @dza89 for your use-case, what value has proved stable? Increasing by a factor of 2 seems like a good first step. |
We actually hit the same issue as datadog in kube. We implementated a pod pid limit to make sure it doesn't happen again. |
Oh interesting, it sounds like Arch is just using the maximum possible value. From
I think I've decided this is the right approach; if we "remove" this limit, users can manage PID usage entirely with Kubernetes via Pod PID limits. |
Nice to hear, the only thing I just realize, is the eks ami build the right place to set this? Shouldn't this be fixed at the source? amzn2-ami-minimal-hvm? |
To offer a contrarian point of view, with no limit I think you'll end up seeing machines hard-lock rather than error about being unable to fork/exec which is harder to diagnose. Just looking for guidance elsewhere, the RHEL 9 STIG recommends it be set to 65536: https://static.open-scap.org/ssg-guides/ssg-rhel9-guide-index.html |
I think it makes sense to configure it here, because the AL2 AMI is not assuming Kubernetes usage (and it is late in its support cycle). @tzneal the failure mode would change, for sure. I think we'd see physical resource exhaustion (which users can obviously already run into and need to monitor for, control with resource limits, etc.). Looks like the value seen in an Arch install is likely a result of |
Is there any update on this issue? has it already been addressed, or is it planing to be? Thanks |
Some more guidance:
|
32bit Max limit - 32768 In case if any one after this, having this set at user data block will give temporary fix before the AMI patched -
|
We (at Snowflake) are now including that patch/work-around in our userData block, but we'd very much appreciate if this could be baked-in by default for 64 bit AWS instance! |
@sfc-gh-jpollard wanna drop a quick PR? |
FWIW - on AL2023 the max pids are set at 4194304 so this would only apply to AL2 |
What happened:
Nodes crashed because we didn't set a max pids per pod.
What you expected to happen:
Question is why this is set to 32768? Which seems to be the max value for 32-bit.
Can we raise it? I'll make the PR.
How to reproduce it (as minimally and precisely as possible):
Run a lot of Java :')
The text was updated successfully, but these errors were encountered: