Skip to content

Commit

Permalink
Avoid returning early on agent join failures
Browse files Browse the repository at this point in the history
When a gossip join failure happens do not return early in the call chain
because a join failure is most likely transient and the retry logic
built in the networkdb is going to retry and succeed. Returning early
makes the initialization of ingress network/sandbox to not happen which
causes a problem even after the gossip join on retry is successful.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
  • Loading branch information
mrjana committed Sep 27, 2016
1 parent bf3d9cc commit 23a782b
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
3 changes: 1 addition & 2 deletions agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,7 @@ func (c *controller) agentSetup() error {

if remoteAddr != "" {
if err := c.agentJoin(remoteAddr); err != nil {
logrus.Errorf("Error in agentJoin : %v", err)
return nil
logrus.Errorf("Error in joining gossip cluster : %v(join will be retried in background)", err)
}
}

Expand Down
4 changes: 4 additions & 0 deletions networkdb/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,10 @@ func (nDB *NetworkDB) retryJoin(members []string, stop <-chan struct{}) {
logrus.Errorf("Failed to join memberlist %s on retry: %v", members, err)
continue
}
if err := nDB.sendNodeEvent(NodeEventTypeJoin); err != nil {
logrus.Errorf("failed to send node join on retry: %v", err)
continue
}
return
case <-stop:
return
Expand Down

0 comments on commit 23a782b

Please sign in to comment.