-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible netlink leak on 3.29.1 #9603
Comments
Uh oh, likely to be one of my PRs to rework route programming:
Any interesting logs in calico-node from route_table.go? I'd expect re-opening netlink sockets after a failure, might help to know which failure you're hitting (if any). |
Actually, think it might be this one: #9135 some of the calls to |
Well that was quick. Just responding to confirm I'm not seeing any logs from |
@imbstack Is it possible for you to test an image with the fix? Image: docker.io/calico/node:v3.29.1-9-g95d2ad73b69e |
Manifests here if that's more convenient; this is a nightly build of |
I've only had it deployed for an hour or so but it definitely looks better to me. At this point before the patch a pod would've had over 400 open fd but this time around they are sitting at ~160 and its looking flat! I'll report back here if I see something different in the next couple days but this looks fixed to me. Thanks for the quick turnaround! |
Thanks for testing this. |
@imbstack Hope you did not see any FD leak. |
Just checked and it's still flat! Looks fixed to me |
Thank you. |
We recently updated calico to 3.29.1 on one of our staging clusters and found that after a few hours there was a clear upward trend in the number of file descriptors held by calico-node pods.
Checking on a running instance after a couple days, we found that the
calico-node -felix
process had nearly 6000 file descriptors according tolsof
, nearly all of which were like the following:Deleting that pod made dropped the fds although the new pod is starting the trend all over again.
Let me know if there is any other debugging data I can provide.
Expected Behavior
A relatively steady state of file descriptors for a calico-node pod.
Current Behavior
A steady increase in open file descriptors.
Possible Solution
Steps to Reproduce (for bugs)
Context
This is ok for now in our staging environment but are worried about going to production this way. It is entirely possible this is due to some weird config on our side but nothing is jumping out at me so far.
Your Environment
Linux ip-10-213-23-129 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 16:48:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: