-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot stop the container: stop timeout #3125
Comments
I don't know about the failed event publish, but Q2 has at least one fairly easy answer; if we're talking about Linux (which I assume we are), then uninterruptible sleep (e.g. in-kernel I/O) is one reason for a container process to refuse to be killed, even after the 30s delay followed by Can you look at the PID mentioned on the host and validate what state it is in? Looks like it is pid 28058 |
@estesp the container process has been killed and gone, but the containerd-shim for this container is still there, 28058 is the pid of containerd-shim. From the |
We need to figure this out. Based on the code, this is because the event publishing process exits with non-zero exit code. https://github.com/containerd/containerd/blob/master/cmd/containerd-shim/main_unix.go#L286 We should:
If containerd doesn't receive the |
@Random-Liu We reproduced this issue. The error message from stderr is
stdout is empty |
@sofat1989 IIRC, If containerd is down at that time, it is fine, because when containerd restarts, it will recover the correct container state by querying the latest state from shims. However, if containerd is runing, but event publish fails, containerd will never know the container exits, in this case we need to do something. |
This should be fixed by containerd/cri#1133 and containerd/cri#1136 Close for now. Feel free to reopen if you encounter this issue again after next containerd upgrade. |
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby/moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: d5669ec1c6eedcd5dd8b0ecd615638934561daa4 Component: engine
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit d5669ec) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby/moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit d5669ec1c6eedcd5dd8b0ecd615638934561daa4) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 768923199f89246ff51039ae030e4b492f8d4555 Component: engine
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit d5669ec) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby/moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit d5669ec1c6eedcd5dd8b0ecd615638934561daa4) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 8c7928adaa83264947b6e296eeb068b99843822e Component: engine
From the release notes: https://github.com/containerd/containerd/releases/tag/v1.2.7 > Welcome to the v1.2.7 release of containerd! > > The seventh patch release for containerd 1.2 introduces OCI image > descriptor annotation support and contains fixes for containerd shim logs, > container stop/deletion, cri plugin and selinux. > > It also contains several important bug fixes for goroutine and file > descriptor leakage in containerd and containerd shims. > > Notable Updates > > - Support annotations in the OCI image descriptor, and filtering image by annotations. containerd/containerd#3254 > - Support context timeout in ttrpc which can help avoid containerd hangs when a shim is unresponsive. containerd/ttrpc#31 > - Fix a bug that containerd shim leaks goroutine and file descriptor after containerd restarts. containerd/ttrpc#37 > - Fix a bug that a container can't be deleted if first deletion attempt is canceled or timeout. containerd/containerd#3264 > - Fix a bug that containerd leaks file descriptor when using v2 containerd shims, e.g. containerd-shim-runc-v1. containerd/containerd#3273 > - Fix a bug that a container with lingering processes can't terminate when it shares pid namespace with another container. moby#38978 > - Fix a bug that containerd can't read shim logs after restart. containerd/containerd#3282 > - Fix a bug that shim_debug option is not honored for existing containerd shims after containerd restarts. containerd/containerd#3283 > - cri: Fix a bug that a container can't be stopped when the exit event is not successfully published by the containerd shim. containerd/containerd#3125, containerd/containerd#3177 > - cri: Fix a bug that exec process is not cleaned up if grpc context is canceled or timeout. contaienrd/cri#1159 > - Fix a selinux keyring labeling issue by updating runc to v1.0.0-rc.8 and selinux library to v1.2.2. opencontainers/selinux#50 > - Update ttrpc to f82148331ad2181edea8f3f649a1f7add6c3f9c2. containerd/containerd#3316 > - Update cri to 49ca74043390bc2eeea7a45a46005fbec58a3f88. containerd/containerd#3330 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: zach <Zachary.Joyner@linux.com>
containerd 1.5.11 has same problem |
Containerd: v1.2.4
Error Message:
The container is soon OOMKilled after it is up. The command,
crictl ps
shows the container is running. But the container is exitedI checked the logs of containerd, here is an error message
Here is an error
I think here should have a TaskExit event. We cannot get the reason why the event is failed to publish. We doubt that the container cannot be stopped because containerd didn't get the TaskExit event
After the containerd is restarted, The containerd shows
Q1: In which conditions, will the publishing events fail?
Q2: Why cannot the container be stopped?
The text was updated successfully, but these errors were encountered: