Add Restart=on-failure for maintained systemd units to make them more robust #1298

invidian · 2021-01-05T10:26:38Z

Right now, if for example Packet metadata service is not reachable, coreos-metadata will never converge, so if node reboots during this time, it will never come back.

Even when the metadata service comes back, node will still be stuck until either node is rebooted or service is manually restarted.

We should be able to automate that to make it more robust by adding Restart=on-failure.

Following https://unix.stackexchange.com/a/272650, we should probably add Restart=on-failure and RestartSec=5s (or some other value) to all units with Type=oneshot to make sure they eventually converge and not just give up.

@pothos also suggesting adding RemainAfterExit=yes:

RemainAfterExit=yes is also missing which means that currently the service is executed multiple times, each time when pulled in as wanted/required. It could mean that an existing file gets lost when the server is unavailable later (but that also depends on how afterburn writes this file out, so it may not be a problem right now).

The text was updated successfully, but these errors were encountered:

invidian added kind/enhancement New feature or request size/m Issues which likely require up to a couple of work days labels Jan 5, 2021

iaguis mentioned this issue Jan 14, 2021

packet: Remove coreos-metadata service config #1315

Closed

surajssd added the proposed/next-sprint Issues proposed for next sprint label Jan 20, 2021

iaguis removed the proposed/next-sprint Issues proposed for next sprint label Jan 25, 2021

surajssd self-assigned this Feb 4, 2021

surajssd mentioned this issue Feb 4, 2021

EM: Add restart on-failure for metadata service #1362

Merged

surajssd closed this as completed in #1362 Feb 12, 2021

surajssd mentioned this issue Feb 12, 2021

Add RemainAfterExit=yes for oneshot services #1371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Restart=on-failure for maintained systemd units to make them more robust #1298

Add Restart=on-failure for maintained systemd units to make them more robust #1298

invidian commented Jan 5, 2021

Add Restart=on-failure for maintained systemd units to make them more robust #1298

Add Restart=on-failure for maintained systemd units to make them more robust #1298

Comments

invidian commented Jan 5, 2021