-
Notifications
You must be signed in to change notification settings - Fork 83
failure during uninstall #238
Comments
I did some more digging, and it looks like the "cmk uninstall" command will only work on one node, since it tries to delete the webhook and that can only succeed on one node (since on the other nodes it'll already be deleted). When that fails on the other nodes it aborts the uninstall leaving the node in a partially-installed state. |
@cbf123 thanks for raising this issue. Are you using the cmk-uninstall pod or the cmk-uninstall daemonset? |
In the |
I created a draft pull request linked above, which changes uninstall module behaviour to described. Feel free to give it a try and report your feedback. I marked it as a "Draft" as most of the uninstall related tests fail now, I'll be happy to fix them once we agree that this is the right way to perform the uninstall process. |
lmdaly: I tried both options. With the cmk-uninstall pod you need to run it on each node manually, and it'll only successfully run on the first node. With the daemonset the first pod to run will be successful, and the other pods fail and keep restarting. przemeklal: That's one way to do it, and I think it makes sense to clean up as much as we can on an uninstall. I took an alternate approach as attached which is a bit more narrow but I think I like yours better. |
When running "kubectl apply -f cmk-uninstall-all-daemonset.yaml" the uninstall worked on one of my two nodes, but failed on the other. I'm now left with pod/cmk-uninstall-all-nodes-bm8lp in CrashLoopBackoff, daemonset.apps/cmk-uninstall-all-nodes still running, and /etc/cmk still present.
The final logs for the failed pod were as follows:
WARNING:root:"cmk-nodereport" for node "controller-0" does not exist.
INFO:root:"cmk-nodereport" for node "controller-0" removed.
INFO:root:Removing "cmk-reconcilereport" from Kubernetes API server for node "controller-0".
INFO:root:Converted "controller-0" to "controller-0" for TPR/CRD name
WARNING:root:"cmk-reconcilereport" for node "controller-0" does not exist.
INFO:root:"cmk-reconcilereport" for node "controller-0" removed.
INFO:root:Removing node taint.
INFO:root:Patching node controller-0:
[
{
"op": "replace",
"path": "/spec/taints",
"value": []
}
]
INFO:root:Removed node taint with key"cmk".
INFO:root:Removing node ERs
INFO:root:Patching node status controller-0:
[
{
"op": "remove",
"path": "/status/capacity/cmk.intel.com~1exclusive-cores"
}
]
ERROR:root:Aborting uninstall: Exception when removing ER: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Content-Length': '187', 'Date': 'Wed, 15 May 2019 17:44:02 GMT'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server rejected our request due to an error in our request","reason":"Invalid","details":{},"code":422}
The text was updated successfully, but these errors were encountered: