-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admin server hangs and stops accepting all requests #16124
Comments
bt in one case:
A trace log file: |
cc @bianpengyuan . Not 100% sure but this may actually be in istio's SD plugin so this may be the wrong repo. |
istio/istio#29334 Various factors can cause DOS at main thread . #16124 (comment) is likely another case. Likely a deadlock as you discovered I think |
@lambdai I agree with your assessment, after looking at it a bit more it seems two completely distinct issues |
Handler::ExportViewData seems to be holding a lock while doing operations on a completion queue. Is that correct? See thread 17 in #16124 (comment) |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
envoy-security@googlegroups.com where the issue will be triaged appropriately.
Title: Admin server hangs and stops accepting all requests
Description:
On rare occasions, the admin server hangs and stops accepting all requests
A call to another listener that forwards to the admin interface works - it will return a 504 timeout, or an immediate 404 if not matching one of the routes. So it appears only the main thread is locked up.
There is a large set of strace, gdb, etc attached in istio/istio#29334.
In one case, an XDS disconnect (triggered by the control plane) seemed to unstick things. In another case, I had a pod in this state for 2+ days.
Repro steps:
Unknown. There have been three reports of this in Istio from different users: istio/istio#29334
Admin and Stats Output:
Not accessible 🙂
Config:
Not accessible 🙂
Logs:
There were no logs, and I cannot turn on higher log level since the admin interface is not accessible
Call Stack:
istio/istio#29334 has a variety of strace, ss, gdb, etc calls.
The text was updated successfully, but these errors were encountered: