Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin server hangs and stops accepting all requests #16124

Closed
howardjohn opened this issue Apr 22, 2021 · 8 comments
Closed

Admin server hangs and stops accepting all requests #16124

howardjohn opened this issue Apr 22, 2021 · 8 comments
Labels
area/admin bug stale stalebot believes this issue/PR has not been touched recently

Comments

@howardjohn
Copy link
Contributor

If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
envoy-security@googlegroups.com where the issue will be triaged appropriately.

Title: Admin server hangs and stops accepting all requests

Description:
On rare occasions, the admin server hangs and stops accepting all requests

A call to another listener that forwards to the admin interface works - it will return a 504 timeout, or an immediate 404 if not matching one of the routes. So it appears only the main thread is locked up.

There is a large set of strace, gdb, etc attached in istio/istio#29334.

In one case, an XDS disconnect (triggered by the control plane) seemed to unstick things. In another case, I had a pod in this state for 2+ days.

Repro steps:
Unknown. There have been three reports of this in Istio from different users: istio/istio#29334

Admin and Stats Output:
Not accessible 🙂

Config:
Not accessible 🙂

Logs:
There were no logs, and I cannot turn on higher log level since the admin interface is not accessible

Call Stack:
istio/istio#29334 has a variety of strace, ss, gdb, etc calls.

@howardjohn howardjohn added bug triage Issue requires triage labels Apr 22, 2021
@howardjohn
Copy link
Contributor Author

bt in one case:

Thread 21 (LWP 287460):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x0000000002b45a73 in gpr_cv_wait ()
No symbol table info available.
#5  0x0000000002b121d1 in grpc_core::Executor::ThreadMain(void*) ()
No symbol table info available.
#6  0x0000000002b4769c in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#7  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 20 (LWP 20146):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a951 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x0000000002b45a63 in gpr_cv_wait ()
No symbol table info available.
#5  0x0000000002b23316 in timer_thread(void*) ()
No symbol table info available.
#6  0x0000000002b4769c in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#7  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 19 (LWP 20019):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x000000000297c1d8 in Envoy::AccessLog::AccessLogFileImpl::flushThreadFunc() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 18 (LWP 20014):
#0  0x00007f3113046d50 in nanosleep () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x000000000369463b in AbslInternalSleepFor ()
No symbol table info available.
#2  0x00000000010efe30 in opencensus::stats::DeltaProducer::RunHarvesterLoop() ()
No symbol table info available.
#3  0x00000000010f18fd in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (opencensus::stats::DeltaProducer::*)(), opencensus::stats::DeltaProducer*> >(void*) ()
No symbol table info available.
#4  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#5  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 17 (LWP 20013):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000010ea170 in opencensus::exporters::stats::(anonymous namespace)::Handler::ExportViewData(std::__1::vector<std::__1::pair<opencensus::stats::ViewDescriptor, opencensus::stats::ViewData>, std::__1::allocator<std::__1::pair<opencensus::stats::ViewDescriptor, opencensus::stats::ViewData> > > const&) ()
No symbol table info available.
#5  0x00000000010f475c in opencensus::stats::StatsExporterImpl::Export() ()
No symbol table info available.
#6  0x00000000010f48c7 in opencensus::stats::StatsExporterImpl::RunWorkerLoop() ()
No symbol table info available.
#7  0x00000000010f5b0d in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (opencensus::stats::StatsExporterImpl::*)(), opencensus::stats::StatsExporterImpl*> >(void*) ()
No symbol table info available.
#8  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#9  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 16 (LWP 18784):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x000000000297c1d8 in Envoy::AccessLog::AccessLogFileImpl::flushThreadFunc() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 15 (LWP 18779):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000029fee74 in Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 14 (LWP 18778):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000029fee74 in Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 13 (LWP 18777):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000029fee74 in Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 12 (LWP 18776):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x00000000029796a3 in Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) ()
No symbol table info available.
#4  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#5  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#6  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 11 (LWP 18775):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x00000000029796a3 in Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) ()
No symbol table info available.
#4  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#5  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#6  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 10 (LWP 18774):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000029fee74 in Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 9 (LWP 18773):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x00000000029796a3 in Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) ()
No symbol table info available.
#4  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#5  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#6  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 8 (LWP 18772):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x00000000029796a3 in Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) ()
No symbol table info available.
#4  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#5  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#6  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 7 (LWP 18721):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#4  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#5  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 6 (LWP 18720):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002e399e0 in epoll_dispatch ()
No symbol table info available.
#2  0x0000000002e3462b in event_base_loop ()
No symbol table info available.
#3  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#4  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#5  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 5 (LWP 18717):
#0  0x00007f3112d65a47 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000002b0eb94 in pollset_work ()
No symbol table info available.
#2  0x0000000002b35c29 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
No symbol table info available.
#3  0x0000000002a10c10 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
No symbol table info available.
#4  0x00000000029febfe in Envoy::Grpc::GoogleAsyncClientThreadLocal::completionThread() ()
No symbol table info available.
#5  0x0000000003655c53 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#6  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#7  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 4 (LWP 18716):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x0000000002b45a73 in gpr_cv_wait ()
No symbol table info available.
#5  0x0000000002b23316 in timer_thread(void*) ()
No symbol table info available.
#6  0x0000000002b4769c in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#7  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 3 (LWP 18715):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x0000000002b45a73 in gpr_cv_wait ()
No symbol table info available.
#5  0x0000000002b121d1 in grpc_core::Executor::ThreadMain(void*) ()
No symbol table info available.
#6  0x0000000002b4769c in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#7  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 2 (LWP 18714):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x000000000368a082 in absl::CondVar::WaitCommon(absl::Mutex*, absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#4  0x0000000002b45a73 in gpr_cv_wait ()
No symbol table info available.
#5  0x0000000002b121d1 in grpc_core::Executor::ThreadMain(void*) ()
No symbol table info available.
#6  0x0000000002b4769c in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) ()
No symbol table info available.
#7  0x00007f311303c6db in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f3112d6571f in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

Thread 1 (LWP 18711):
#0  0x00007f3112d5f639 in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x000000000368a979 in absl::synchronization_internal::Waiter::Wait(absl::synchronization_internal::KernelTimeout) ()
No symbol table info available.
#2  0x000000000368a660 in AbslInternalPerThreadSemWait ()
No symbol table info available.
#3  0x0000000003687aa2 in absl::Mutex::Block(absl::base_internal::PerThreadSynch*) ()
No symbol table info available.
#4  0x0000000003688cc1 in absl::Mutex::LockSlowLoop(absl::SynchWaitParams*, int) ()
No symbol table info available.
#5  0x0000000003687d6c in absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) ()
No symbol table info available.
#6  0x0000000003687c0e in absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) ()
No symbol table info available.
#7  0x00000000010f4486 in opencensus::stats::StatsExporterImpl::RegisterPushHandler(std::__1::unique_ptr<opencensus::stats::StatsExporter::Handler, std::__1::default_delete<opencensus::stats::StatsExporter::Handler> >) ()
No symbol table info available.
#8  0x00000000010f4ac1 in opencensus::stats::StatsExporter::RegisterPushHandler(std::__1::unique_ptr<opencensus::stats::StatsExporter::Handler, std::__1::default_delete<opencensus::stats::StatsExporter::Handler> >) ()
No symbol table info available.
#9  0x00000000010e90e3 in opencensus::exporters::stats::StackdriverExporter::Register(opencensus::exporters::stats::StackdriverOptions&&) ()
No symbol table info available.
#10 0x00000000010824a1 in proxy_wasm::null_plugin::Stackdriver::StackdriverRootContext::configure(unsigned long) ()
No symbol table info available.
#11 0x000000000108026e in proxy_wasm::null_plugin::Stackdriver::StackdriverRootContext::onConfigure(unsigned long) ()
No symbol table info available.
#12 0x0000000001d60bb4 in std::__1::__function::__func<proxy_wasm::NullPlugin::getFunction(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::function<proxy_wasm::Word (proxy_wasm::ContextBase*, proxy_wasm::Word, proxy_wasm::Word)>*)::$_17, std::__1::allocator<proxy_wasm::NullPlugin::getFunction(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::function<proxy_wasm::Word (proxy_wasm::ContextBase*, proxy_wasm::Word, proxy_wasm::Word)>*)::$_17>, proxy_wasm::Word (proxy_wasm::ContextBase*, proxy_wasm::Word, proxy_wasm::Word)>::operator()(proxy_wasm::ContextBase*&&, proxy_wasm::Word&&, proxy_wasm::Word&&) ()
No symbol table info available.
#13 0x0000000001d9262d in proxy_wasm::ContextBase::onConfigure(std::__1::shared_ptr<proxy_wasm::PluginBase>) ()
No symbol table info available.
#14 0x0000000001dad338 in proxy_wasm::createWasm(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::shared_ptr<proxy_wasm::PluginBase>, std::__1::function<std::__1::shared_ptr<proxy_wasm::WasmHandleBase> (std::__1::basic_string_view<char, std::__1::char_traits<char> >)>, std::__1::function<std::__1::shared_ptr<proxy_wasm::WasmHandleBase> (std::__1::shared_ptr<proxy_wasm::WasmHandleBase>)>, bool) ()
No symbol table info available.
#15 0x0000000001b70b32 in Envoy::Extensions::Common::Wasm::createWasm(std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::Plugin> const&, std::__1::shared_ptr<Envoy::Stats::Scope> const&, Envoy::Upstream::ClusterManager&, Envoy::Init::Manager&, Envoy::Event::Dispatcher&, Envoy::Api::Api&, Envoy::Server::ServerLifecycleNotifier&, std::__1::unique_ptr<Envoy::Config::DataSource::RemoteAsyncDataProvider, std::__1::default_delete<Envoy::Config::DataSource::RemoteAsyncDataProvider> >&, std::__1::function<void (std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::WasmHandle>)>&&, std::__1::function<proxy_wasm::ContextBase* (Envoy::Extensions::Common::Wasm::Wasm*, std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::Plugin> const&)>)::$_5::operator()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) const ()
No symbol table info available.
#16 0x0000000001b6f19e in Envoy::Extensions::Common::Wasm::createWasm(std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::Plugin> const&, std::__1::shared_ptr<Envoy::Stats::Scope> const&, Envoy::Upstream::ClusterManager&, Envoy::Init::Manager&, Envoy::Event::Dispatcher&, Envoy::Api::Api&, Envoy::Server::ServerLifecycleNotifier&, std::__1::unique_ptr<Envoy::Config::DataSource::RemoteAsyncDataProvider, std::__1::default_delete<Envoy::Config::DataSource::RemoteAsyncDataProvider> >&, std::__1::function<void (std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::WasmHandle>)>&&, std::__1::function<proxy_wasm::ContextBase* (Envoy::Extensions::Common::Wasm::Wasm*, std::__1::shared_ptr<Envoy::Extensions::Common::Wasm::Plugin> const&)>) ()
No symbol table info available.
#17 0x000000000159b8c9 in Envoy::Extensions::HttpFilters::Wasm::FilterConfig::FilterConfig(envoy::extensions::filters::http::wasm::v3::Wasm const&, Envoy::Server::Configuration::FactoryContext&) ()
No symbol table info available.
#18 0x000000000159a7ae in Envoy::Extensions::HttpFilters::Wasm::WasmFilterConfig::createFilterFactoryFromProtoTyped(envoy::extensions::filters::http::wasm::v3::Wasm const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::Configuration::FactoryContext&) ()
No symbol table info available.
#19 0x000000000159aac8 in Envoy::Extensions::HttpFilters::Common::FactoryBase<envoy::extensions::filters::http::wasm::v3::Wasm, envoy::extensions::filters::http::wasm::v3::Wasm>::createFilterFactoryFromProto(google::protobuf::Message const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::Configuration::FactoryContext&) ()
No symbol table info available.
#20 0x0000000002c7026d in Envoy::Extensions::NetworkFilters::HttpConnectionManager::HttpConnectionManagerConfig::processFilter(envoy::extensions::filters::network::http_connection_manager::v3::HttpFilter const&, int, std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::list<std::__1::unique_ptr<Envoy::Config::ExtensionConfigProvider<Envoy::Server::Configuration::NamedHttpFilterConfigFactory, std::__1::function<void (Envoy::Http::FilterChainFactoryCallbacks&)> >, std::__1::default_delete<Envoy::Config::ExtensionConfigProvider<Envoy::Server::Configuration::NamedHttpFilterConfigFactory, std::__1::function<void (Envoy::Http::FilterChainFactoryCallbacks&)> > > >, std::__1::allocator<std::__1::unique_ptr<Envoy::Config::ExtensionConfigProvider<Envoy::Server::Configuration::NamedHttpFilterConfigFactory, std::__1::function<void (Envoy::Http::FilterChainFactoryCallbacks&)> >, std::__1::default_delete<Envoy::Config::ExtensionConfigProvider<Envoy::Server::Configuration::NamedHttpFilterConfigFactory, std::__1::function<void (Envoy::Http::FilterChainFactoryCallbacks&)> > > > > >&, char const*, bool) ()
No symbol table info available.
#21 0x0000000002c6e39a in Envoy::Extensions::NetworkFilters::HttpConnectionManager::HttpConnectionManagerConfig::HttpConnectionManagerConfig(envoy::extensions::filters::network::http_connection_manager::v3::HttpConnectionManager const&, Envoy::Server::Configuration::FactoryContext&, Envoy::Http::DateProvider&, Envoy::Router::RouteConfigProviderManager&, Envoy::Config::ConfigProviderManager&, Envoy::Tracing::HttpTracerManager&, Envoy::Filter::Http::FilterConfigProviderManager&) ()
No symbol table info available.
#22 0x0000000002c6bf82 in Envoy::Extensions::NetworkFilters::HttpConnectionManager::HttpConnectionManagerFilterConfigFactory::createFilterFactoryFromProtoTyped(envoy::extensions::filters::network::http_connection_manager::v3::HttpConnectionManager const&, Envoy::Server::Configuration::FactoryContext&) ()
No symbol table info available.
#23 0x0000000002c72a3f in Envoy::Extensions::NetworkFilters::Common::FactoryBase<envoy::extensions::filters::network::http_connection_manager::v3::HttpConnectionManager, envoy::extensions::filters::network::http_connection_manager::v3::HttpConnectionManager>::createFilterFactoryFromProto(google::protobuf::Message const&, Envoy::Server::Configuration::FactoryContext&) ()
No symbol table info available.
#24 0x0000000002c15b07 in Envoy::Server::ProdListenerComponentFactory::createNetworkFilterFactoryList_(google::protobuf::RepeatedPtrField<envoy::config::listener::v3::Filter> const&, Envoy::Server::Configuration::FilterChainFactoryContext&) ()
No symbol table info available.
#25 0x0000000002c21534 in Envoy::Server::ProdListenerComponentFactory::createNetworkFilterFactoryList(google::protobuf::RepeatedPtrField<envoy::config::listener::v3::Filter> const&, Envoy::Server::Configuration::FilterChainFactoryContext&) ()
No symbol table info available.
#26 0x0000000002c2114b in Envoy::Server::ListenerFilterChainFactoryBuilder::buildFilterChainInternal(envoy::config::listener::v3::FilterChain const&, std::__1::unique_ptr<Envoy::Server::Configuration::FilterChainFactoryContext, std::__1::default_delete<Envoy::Server::Configuration::FilterChainFactoryContext> >&&) const ()
No symbol table info available.
#27 0x0000000002c20f29 in Envoy::Server::ListenerFilterChainFactoryBuilder::buildFilterChain(envoy::config::listener::v3::FilterChain const&, Envoy::Server::FilterChainFactoryContextCreator&) const ()
No symbol table info available.
#28 0x0000000002c4115c in Envoy::Server::FilterChainManagerImpl::addFilterChains(absl::Span<envoy::config::listener::v3::FilterChain const* const>, envoy::config::listener::v3::FilterChain const*, Envoy::Server::FilterChainFactoryBuilder&, Envoy::Server::FilterChainFactoryContextCreator&) ()
No symbol table info available.
#29 0x0000000002c0fb19 in Envoy::Server::ListenerImpl::buildFilterChains() ()
No symbol table info available.
#30 0x0000000002c0d838 in Envoy::Server::ListenerImpl::ListenerImpl(envoy::config::listener::v3::Listener const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::ListenerManagerImpl&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool, unsigned long, unsigned int) ()
No symbol table info available.
#31 0x0000000002c1c086 in Envoy::Server::ListenerManagerImpl::addOrUpdateListenerInternal(envoy::config::listener::v3::Listener const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) ()
No symbol table info available.
#32 0x0000000002c1ad98 in Envoy::Server::ListenerManagerImpl::addOrUpdateListener(envoy::config::listener::v3::Listener const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) ()
No symbol table info available.
#33 0x0000000002c6272f in Envoy::Server::LdsApiImpl::onConfigUpdate(std::__1::vector<std::__1::reference_wrapper<Envoy::Config::DecodedResource>, std::__1::allocator<std::__1::reference_wrapper<Envoy::Config::DecodedResource> > > const&, google::protobuf::RepeatedPtrField<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) ()
No symbol table info available.
#34 0x0000000002c63940 in Envoy::Server::LdsApiImpl::onConfigUpdate(std::__1::vector<std::__1::reference_wrapper<Envoy::Config::DecodedResource>, std::__1::allocator<std::__1::reference_wrapper<Envoy::Config::DecodedResource> > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) ()
No symbol table info available.
#35 0x0000000002cfd139 in Envoy::Config::GrpcSubscriptionImpl::onConfigUpdate(std::__1::vector<std::__1::reference_wrapper<Envoy::Config::DecodedResource>, std::__1::allocator<std::__1::reference_wrapper<Envoy::Config::DecodedResource> > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) ()
No symbol table info available.
#36 0x0000000002d02b1c in Envoy::Config::GrpcMuxImpl::onDiscoveryResponse(std::__1::unique_ptr<envoy::service::discovery::v3::DiscoveryResponse, std::__1::default_delete<envoy::service::discovery::v3::DiscoveryResponse> >&&, Envoy::Config::ControlPlaneStats&) ()
No symbol table info available.
#37 0x00000000029c5d11 in Envoy::Grpc::AsyncStreamCallbacks<envoy::service::discovery::v3::DiscoveryResponse>::onReceiveMessageRaw(std::__1::unique_ptr<Envoy::Buffer::Instance, std::__1::default_delete<Envoy::Buffer::Instance> >&&) ()
No symbol table info available.
#38 0x0000000002d17127 in Envoy::Grpc::AsyncStreamImpl::onData(Envoy::Buffer::Instance&, bool) ()
No symbol table info available.
#39 0x0000000002d1c682 in Envoy::Http::AsyncStreamImpl::encodeData(Envoy::Buffer::Instance&, bool) ()
No symbol table info available.
#40 0x0000000002da09f1 in Envoy::Router::UpstreamRequest::decodeData(Envoy::Buffer::Instance&, bool) ()
No symbol table info available.
#41 0x0000000002b4e6ab in Envoy::Http::ResponseDecoderWrapper::decodeData(Envoy::Buffer::Instance&, bool) ()
No symbol table info available.
#42 0x0000000002ccfb2e in Envoy::Http::Http2::ConnectionImpl::onFrameReceived(nghttp2_frame const*) ()
No symbol table info available.
#43 0x0000000002cd7c08 in Envoy::Http::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::$_17::__invoke(nghttp2_session*, nghttp2_frame const*, void*) ()
No symbol table info available.
#44 0x0000000002e43a8c in nghttp2_session_on_data_received ()
No symbol table info available.
#45 0x0000000002e459f6 in nghttp2_session_mem_recv ()
No symbol table info available.
#46 0x0000000002cce49d in Envoy::Http::Http2::ConnectionImpl::dispatch(Envoy::Buffer::Instance&) ()
No symbol table info available.
#47 0x0000000002ccf115 in virtual thunk to Envoy::Http::Http2::ConnectionImpl::dispatch(Envoy::Buffer::Instance&) ()
No symbol table info available.
#48 0x0000000002bac9e0 in Envoy::Http::CodecClient::onData(Envoy::Buffer::Instance&) ()
No symbol table info available.
#49 0x0000000002badc45 in Envoy::Http::CodecClient::CodecReadFilter::onData(Envoy::Buffer::Instance&, bool) ()
No symbol table info available.
#50 0x0000000002c3e44f in Envoy::Network::FilterManagerImpl::onContinueReading(Envoy::Network::FilterManagerImpl::ActiveReadFilter*, Envoy::Network::ReadBufferSource&) ()
No symbol table info available.
#51 0x0000000002c38209 in Envoy::Network::ConnectionImpl::onReadReady() ()
No symbol table info available.
#52 0x0000000002c35dbf in Envoy::Network::ConnectionImpl::onFileEvent(unsigned int) ()
No symbol table info available.
#53 0x00000000029823f1 in std::__1::__function::__func<Envoy::Event::DispatcherImpl::createFileEvent(int, std::__1::function<void (unsigned int)>, Envoy::Event::FileTriggerType, unsigned int)::$_5, std::__1::allocator<Envoy::Event::DispatcherImpl::createFileEvent(int, std::__1::function<void (unsigned int)>, Envoy::Event::FileTriggerType, unsigned int)::$_5>, void (unsigned int)>::operator()(unsigned int&&) ()
No symbol table info available.
#54 0x000000000298369c in Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::__invoke(int, short, void*) ()
No symbol table info available.
#55 0x0000000002e36018 in event_process_active_single_queue ()
No symbol table info available.
#56 0x0000000002e34a11 in event_base_loop ()
No symbol table info available.
#57 0x0000000002969b2e in Envoy::Server::InstanceImpl::run() ()
No symbol table info available.
#58 0x00000000011ccee4 in Envoy::MainCommonBase::run() ()
No symbol table info available.
#59 0x00000000011cd704 in Envoy::MainCommon::main(int, char**, std::__1::function<void (Envoy::Server::Instance&)>) ()
No symbol table info available.
#60 0x00000000011cb71c in main ()
No symbol table info available.
quit
Detaching from program: target:/usr/local/bin/envoy, process 18711

A trace log file:
ingress.txt. We froze at Thu Apr 22 18:03:10 UTC 2021. Most of the logs are health check probes that don't hit the admin interface

@howardjohn
Copy link
Contributor Author

cc @bianpengyuan . Not 100% sure but this may actually be in istio's SD plugin so this may be the wrong repo.

@asraa asraa added area/admin and removed triage Issue requires triage labels Apr 23, 2021
@lambdai
Copy link
Contributor

lambdai commented Apr 23, 2021

istio/istio#29334 Various factors can cause DOS at main thread .
Deadlock and epoll/read should have quite different symptom though: the former is almost 0 cpu utilization and the latter saturate 1 cpu core.
#14954 should have fix one of the cause of the latter case. The very first istio/istio#29334 might have been resolved.

#16124 (comment) is likely another case. Likely a deadlock as you discovered

I think

@howardjohn
Copy link
Contributor Author

@lambdai I agree with your assessment, after looking at it a bit more it seems two completely distinct issues

@antoniovicente
Copy link
Contributor

Handler::ExportViewData seems to be holding a lock while doing operations on a completion queue. Is that correct?

See thread 17 in #16124 (comment)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jun 11, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/admin bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

4 participants