-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: Avoid allocating tasks that are no longer running #2017
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -281,7 +281,7 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) { | |
} | ||
|
||
for _, t := range tasks { | ||
if taskDead(t) { | ||
if t.Status.State > api.TaskStateRunning { | ||
continue | ||
} | ||
|
||
|
@@ -486,17 +486,6 @@ func (a *Allocator) doNodeAlloc(ctx context.Context, ev events.Event) { | |
} | ||
} | ||
|
||
// taskRunning checks whether a task is either actively running, or in the | ||
// process of starting up. | ||
func taskRunning(t *api.Task) bool { | ||
return t.DesiredState <= api.TaskStateRunning && t.Status.State <= api.TaskStateRunning | ||
} | ||
|
||
// taskDead checks whether a task is not actively running as far as allocator purposes are concerned. | ||
func taskDead(t *api.Task) bool { | ||
return t.DesiredState > api.TaskStateRunning && t.Status.State > api.TaskStateRunning | ||
} | ||
|
||
// taskReadyForNetworkVote checks if the task is ready for a network | ||
// vote to move it to PENDING state. | ||
func taskReadyForNetworkVote(t *api.Task, s *api.Service, nc *networkContext) bool { | ||
|
@@ -599,10 +588,9 @@ func (a *Allocator) doTaskAlloc(ctx context.Context, ev events.Event) { | |
|
||
nc := a.netCtx | ||
|
||
// If the task has stopped running or it's being deleted then | ||
// we should free the network resources associated with the | ||
// task right away. | ||
if taskDead(t) || isDelete { | ||
// If the task has stopped running then we should free the network | ||
// resources associated with the task right away. | ||
if t.Status.State > api.TaskStateRunning || isDelete { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only concern here is, as you guys already discussed here and also in the pr which introduced the current check, is that we release the resources associated to this task, while the task may still be running on an unresponsive node, and a new task coming up will be given those resources. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I don't think this is a problem. This is looking at observed state, not desired state. So we err on the side of keeping the resources tied up if the node is unresponsive. There is a mechanism that marks tasks as orphaned after 24 hours of the node being down, so the allocator can finally free these resources. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, thanks for the clarification. |
||
if nc.nwkAllocator.IsTaskAllocated(t) { | ||
if err := nc.nwkAllocator.DeallocateTask(t); err != nil { | ||
log.G(ctx).WithError(err).Errorf("Failed freeing network resources for task %s", t.ID) | ||
|
@@ -637,7 +625,7 @@ func (a *Allocator) doTaskAlloc(ctx context.Context, ev events.Event) { | |
// available in store. But we still need to | ||
// cleanup network resources associated with | ||
// the task. | ||
if taskRunning(t) && !isDelete { | ||
if t.Status.State <= api.TaskStateRunning && !isDelete { | ||
log.G(ctx).Errorf("Event %T: Failed to get service %s for task %s state %s: could not find service %s", ev, t.ServiceID, t.ID, t.Status.State, t.ServiceID) | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, because if the task's state is beyond running, we should not bother attempting an allocation for it, regardless of the desired state. Once a task moves beyond running, it cannot be brought back in the pipeline right ?
A new task will be created in place of it, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.