Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: Avoid allocating tasks that are no longer running #2017

Merged
merged 1 commit into from
Mar 10, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 5 additions & 17 deletions manager/allocator/network.go
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) {
}

for _, t := range tasks {
if taskDead(t) {
if t.Status.State > api.TaskStateRunning {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, because if the task's state is beyond running, we should not bother attempting an allocation for it, regardless of the desired state. Once a task moves beyond running, it cannot be brought back in the pipeline right ?
A new task will be created in place of it, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a task moves beyond running, it cannot be brought back in the pipeline right ?

That's correct.

continue
}

Expand Down Expand Up @@ -486,17 +486,6 @@ func (a *Allocator) doNodeAlloc(ctx context.Context, ev events.Event) {
}
}

// taskRunning checks whether a task is either actively running, or in the
// process of starting up.
func taskRunning(t *api.Task) bool {
return t.DesiredState <= api.TaskStateRunning && t.Status.State <= api.TaskStateRunning
}

// taskDead checks whether a task is not actively running as far as allocator purposes are concerned.
func taskDead(t *api.Task) bool {
return t.DesiredState > api.TaskStateRunning && t.Status.State > api.TaskStateRunning
}

// taskReadyForNetworkVote checks if the task is ready for a network
// vote to move it to PENDING state.
func taskReadyForNetworkVote(t *api.Task, s *api.Service, nc *networkContext) bool {
Expand Down Expand Up @@ -599,10 +588,9 @@ func (a *Allocator) doTaskAlloc(ctx context.Context, ev events.Event) {

nc := a.netCtx

// If the task has stopped running or it's being deleted then
// we should free the network resources associated with the
// task right away.
if taskDead(t) || isDelete {
// If the task has stopped running then we should free the network
// resources associated with the task right away.
if t.Status.State > api.TaskStateRunning || isDelete {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern here is, as you guys already discussed here and also in the pr which introduced the current check, is that we release the resources associated to this task, while the task may still be running on an unresponsive node, and a new task coming up will be given those resources.
So, we could end up with two containers in the cluster having same IP address.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think this is a problem. This is looking at observed state, not desired state. So we err on the side of keeping the resources tied up if the node is unresponsive. There is a mechanism that marks tasks as orphaned after 24 hours of the node being down, so the allocator can finally free these resources.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for the clarification.

if nc.nwkAllocator.IsTaskAllocated(t) {
if err := nc.nwkAllocator.DeallocateTask(t); err != nil {
log.G(ctx).WithError(err).Errorf("Failed freeing network resources for task %s", t.ID)
Expand Down Expand Up @@ -637,7 +625,7 @@ func (a *Allocator) doTaskAlloc(ctx context.Context, ev events.Event) {
// available in store. But we still need to
// cleanup network resources associated with
// the task.
if taskRunning(t) && !isDelete {
if t.Status.State <= api.TaskStateRunning && !isDelete {
log.G(ctx).Errorf("Event %T: Failed to get service %s for task %s state %s: could not find service %s", ev, t.ServiceID, t.ID, t.Status.State, t.ServiceID)
return
}
Expand Down