Skip to content

Commit

Permalink
libct/cg/sd: reset-failed and retry startUnit on UnitExists
Browse files Browse the repository at this point in the history
In case a systemd unit fails (for example, timed out or OOM-killed),
systemd keeps the unit. This prevents starting a new container with
the same systemd unit name.

The fix is to call reset-failed in case UnitExists error is returned,
and retry once.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
  • Loading branch information
kolyshkin committed Mar 23, 2023
1 parent 37083ac commit e6cc9a6
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions libcontainer/cgroups/systemd/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,9 @@ func isUnitExists(err error) bool {

func startUnit(cm *dbusConnManager, unitName string, properties []systemdDbus.Property, ignoreExist bool) error {
statusChan := make(chan string, 1)
retry := true

retry:
err := cm.retryOnDisconnect(func(c *systemdDbus.Conn) error {
_, err := c.StartTransientUnitContext(context.TODO(), unitName, "replace", properties, statusChan)
return err
Expand All @@ -140,6 +143,14 @@ func startUnit(cm *dbusConnManager, unitName string, properties []systemdDbus.Pr
// https://github.com/opencontainers/runc/pull/1124).
return nil
}
if retry {
// In case a unit with the same name exists, this may
// be a leftover failed unit. Reset it, so systemd can
// remove it, and retry once.
resetFailedUnit(cm, unitName)
retry = false
goto retry
}
return err
}

Expand Down

0 comments on commit e6cc9a6

Please sign in to comment.