-
Notifications
You must be signed in to change notification settings - Fork 49
EM: Add restart on-failure for metadata service #1362
Conversation
Seeing this PR, I think we should slowly start changing Packet -> Equinx Metal |
We have #1060 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other services which have Type=oneshot are following: wait-for-dns, bootkube, delete-node, create-etcd-config, persist-data-raid.
Most of them are the type of services which we want them to fail early and the user know about it, instead of them endlessly trying and someone else doing a time out on them.
If there is a DNS outage, I think we should retry wait-for-dns.service
, as it will block starting kubelet.
delete-node.service
we should also retry IMO, so the node does not go away unregistered. As the pods will stay assigned for long on this node and won't be re-scheduled.
2 points above also applies to other platforms, not only Packet.
assets/terraform-modules/packet/flatcar-linux/kubernetes/cl/controller.yaml.tmpl
Outdated
Show resolved
Hide resolved
This commit adds `Restart=on-failure` and `RestartSec=10s` to the metadata service. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
's | Flatcar | Flatcar Container Linux | g' Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
This commit adds `Restart=on-failure` and `RestartSec=5s` to the wait-for-dns service on all platforms. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
This commit adds `Restart=on-failure` and `RestartSec=5s` to the delete-node service on all platforms. Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
I don't know what the unforeseen consequences will be for adding retries for |
f0813be
to
08d342f
Compare
I can wait until #1368 is merged. |
One more thing to consider: The |
Creating a new issue for what Kai has suggested. |
This PR adds
Restart=on-failure
andRestartSec=5s
to the metadata service.Fixes #1298
The other services which have
Type=oneshot
are following:wait-for-dns
,bootkube
,delete-node
,create-etcd-config
,persist-data-raid
.Most of them are the type of services which we want them to fail early and the user know about it, instead of them endlessly trying and someone else doing a time out on them.