This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
Add Restart=on-failure for maintained systemd units to make them more robust #1298
Labels
kind/enhancement
New feature or request
size/m
Issues which likely require up to a couple of work days
Right now, if for example Packet metadata service is not reachable,
coreos-metadata
will never converge, so if node reboots during this time, it will never come back.Even when the metadata service comes back, node will still be stuck until either node is rebooted or service is manually restarted.
We should be able to automate that to make it more robust by adding
Restart=on-failure
.Following https://unix.stackexchange.com/a/272650, we should probably add Restart=on-failure and RestartSec=5s (or some other value) to all units with Type=oneshot to make sure they eventually converge and not just give up.
@pothos also suggesting adding
RemainAfterExit=yes
:The text was updated successfully, but these errors were encountered: