draining but never gets to reboot #27

davidkarlsen · 2018-08-15T13:32:36Z

Kured kicked in rightly, and disabled scheduling for my node:

app03.lan.davidkarlsen.com   Ready,SchedulingDisabled   <none>    62d       v1.11.2

I can see a number of pods being killed:

root@app03:/var/log/containers# tail kured-vkwnk_kube-system_kured-83287b3a6ba5d8a4dfd8a22822932a1655b71cc2ca2bfbd5007f5d389992100c.log 
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"kube-system-kubernetes-dashboard-proxy-55c7756d46-dsqzq\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066034203Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"coredns-78fcdf6894-k87gl\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066110928Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-grafana-788f47b84-bkggz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066134368Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-alertmanager-cbcc46d55-gwkqz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.179296096Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"logging-cerebro-6794fc6bc6-t26v9\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.179378188Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monocular-monocular-mongodb-5644f785b9-24tmz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.202395762Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-blackbox-exporter-7775df5698-86s67\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.202444091Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-server-75bfb9f66-xm9vp\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.249172642Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"kube-ops-view-kube-ops-view-kube-ops-view-6db67848c4-krmx8\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.449382938Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"logging-elasticsearch-client-5978d8f465-t9kkm\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.649729383Z"}
root@app03:/var/log/containers#

but then nothing more happens. I guess it fails at something - but the logs should tell why.

these pods are left (which are mainly daemons, except for the nginx-ingress:

Non-terminated Pods:         (9 in total)
  Namespace                  Name                                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                          ------------  ----------  ---------------  -------------
  auditbeat                  auditbeat-auditbeat-q4k65                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)
  datadog                    datadog-datadog-agent-datadog-pr9w5                           200m (2%)     200m (2%)   256Mi (1%)       256Mi (1%)
  kube-system                calico-node-qlmcc                                             250m (3%)     0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-proxy-5cf42                                              0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-system-nginx-ingress-controller-84f76b76cb-jp8dr         0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-system-nginx-ingress-default-backend-6b557bb97c-vlfqc    0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kured-vkwnk                                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)
  logging                    fluent-bit-2djgl                                              100m (1%)     0 (0%)      100Mi (0%)       100Mi (0%)
  monitoring                 monitoring-prometheus-node-exporter-6jl6k                     0 (0%)        0 (0%)      0 (0%)           0 (0%)

any hints?

The text was updated successfully, but these errors were encountered:

davidkarlsen · 2018-08-15T13:37:10Z

Update:
I kubectl deleted the nginx pods, and it actually booted the node:

{"log":"time=\"2018-08-15T13:33:53Z\" level=info msg=\"node \\\"app03.lan.davidkarlsen.com\\\" drained\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T13:33:53.526347753Z"}
{"log":"time=\"2018-08-15T13:33:53Z\" level=info msg=\"Commanding reboot\"\n","stream":"stderr","time":"2018-08-15T13:33:53.530629662Z"}
{"log":"time=\"2018-08-15T13:34:23Z\" level=warning msg=\"Failed to set wall message, ignoring: Connection reset by peer\" cmd=/bin/systemctl std=err\n","stream":"stderr","time":"2018-08-15T13:34:23.90611995Z"}
{"log":"time=\"2018-08-15T13:34:23Z\" level=warning msg=\"Failed to reboot system via logind: Transport endpoint is not connected\" cmd=/bin/systemctl std=err\n","stream":"stderr","time":"2018-08-15T13:34:23.906204151Z"}

but should this not be reported in some way?

also - what can cause the failing messages at the end?

brantb · 2018-10-22T18:55:37Z

I was being hit by this too. It turns out kubectl drain will hang if there is a PodDisruptionBudget preventing a pod from being evicted, and the nginx-ingress helm chart creates a couple of these by default. A newer version of the chart fixes this: helm/charts#7127

kubectl drain would have eventually returned if the nginx-ingress pods were removed from the node for some other reason like being manually deleted.

davidkarlsen · 2018-10-22T18:59:14Z

I think we close this as it was actually a PDB problem rather than flux - I fixed the chart.

davidkarlsen changed the title ~~draining but not rebooting~~ draining but never gets to reboo Aug 15, 2018

davidkarlsen changed the title ~~draining but never gets to reboo~~ draining but never gets to reboot Aug 15, 2018

davidkarlsen closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draining but never gets to reboot #27

draining but never gets to reboot #27

davidkarlsen commented Aug 15, 2018

davidkarlsen commented Aug 15, 2018

brantb commented Oct 22, 2018

davidkarlsen commented Oct 22, 2018

draining but never gets to reboot #27

draining but never gets to reboot #27

Comments

davidkarlsen commented Aug 15, 2018

davidkarlsen commented Aug 15, 2018

brantb commented Oct 22, 2018

davidkarlsen commented Oct 22, 2018