Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Retry for restart kube-controller-manager #10013

Conversation

nothingcompare2u
Copy link

@nothingcompare2u nothingcompare2u commented Apr 23, 2023

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:

  • The following is the error message:
4:07AM: Sunday 23 April 2023  08:07:10 +0000 (0:00:00.059)       0:15:16.738 ********** 
4:07AM: 
4:07AM: RUNNING HANDLER [kubernetes/preinstall : Preinstall | restart kube-controller-manager crio/containerd] ***
4:07AM: fatal: [gate-5-all]: FAILED! => {"changed": true, "cmd": "/usr/local/bin/crictl pods --name kube-controller-manager* -q | xargs -I% --no-run-if-empty bash -c '/usr/local/bin/crictl stopp % && /usr/local/bin/crictl rmp %'", "delta": "0:00:00.396997", "end": "2023-04-23 16:07:10.974322", "msg": "non-zero return code", "rc": 123, "start": "2023-04-23 16:07:10.577325", "stderr": "E0423 16:07:10.970938  108138 remote_runtime.go:295] \"RemovePodSandbox from runtime service failed\" err=\"rpc error: code = Unknown desc = failed to remove container \\\"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\\\": failed to set removing state for container \\\"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\\\": container is in starting state, can't be removed\" podSandboxID=\"2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2\"\nremoving the pod sandbox \"2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2\": rpc error: code = Unknown desc = failed to remove container \"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\": failed to set removing state for container \"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\": container is in starting state, can't be removed", "stderr_lines": ["E0423 16:07:10.970938  108138 remote_runtime.go:295] \"RemovePodSandbox from runtime service failed\" err=\"rpc error: code = Unknown desc = failed to remove container \\\"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\\\": failed to set removing state for container \\\"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\\\": container is in starting state, can't be removed\" podSandboxID=\"2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2\"", "removing the pod sandbox \"2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2\": rpc error: code = Unknown desc = failed to remove container \"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\": failed to set removing state for container \"1532049c5220deac260c5d75d54c97002693c42d371f16c0ede4ec4078f43634\": container is in starting state, can't be removed"], "stdout": "Stopped sandbox 2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2", "stdout_lines": ["Stopped sandbox 2f443da62416698edf333e95bb44d267915e88931ce654d1b0e45a03f0e5ace2"]}
4:07AM: Sunday 23 April 2023  08:07:11 +0000 (0:00:00.987)       0:15:17.726 ********** 
4:07AM: Sunday 23 April 2023  08:07:11 +0000 (0:00:00.000)       0:15:17.727 ********** 
  • I just retry the command manually after seconds , And then the result of command is ok.

  • Add retry for restart kube-controller-manager crio/containerd ,just like the following restart kube-apiserver crio/containerd in the same file.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Add Retry for restart kube-controller-manager

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 23, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 23, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @hangscer8. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@oomichi oomichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @oomichi

roles/kubernetes/preinstall/handlers/main.yml Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot requested a review from oomichi April 24, 2023 23:45
@nothingcompare2u nothingcompare2u force-pushed the add_retry_for_restart_crio_containerd branch from d2faf27 to 4297d33 Compare April 25, 2023 01:47
@yankay
Copy link
Member

yankay commented Apr 25, 2023

Thanks @hangscer8
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 25, 2023
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
@nothingcompare2u nothingcompare2u force-pushed the add_retry_for_restart_crio_containerd branch from 4297d33 to cdb82a8 Compare April 25, 2023 02:05
Copy link
Contributor

@oomichi oomichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating.
Looks good for me now.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2023
Copy link
Member

@floryut floryut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hangscer8 looks good 👍
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: floryut, hangscer8, oomichi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 25, 2023
@k8s-ci-robot k8s-ci-robot merged commit 4ddbd2b into kubernetes-sigs:master Apr 25, 2023
@yankay yankay mentioned this pull request May 15, 2023
pedro-peter pushed a commit to pedro-peter/kubespray that referenced this pull request May 8, 2024
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants