-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
controlapi: Allow removal of network when only dead tasks reference it #2018
Conversation
RemoveNetwork currently won't allow a network to be removed if any task in the store references it. This is a change relative to Docker 1.12, where only services were checked. This might be a little heavy handed. Some tasks in the store may be long dead. Resource attachment tasks, in particular, must be detached by the API user, and this might be neglected after a failure. Change RemoveNetwork to only fail the network deletion if a task has actual and desired state of "running" or below. This means that neither a failed resource attachment task (whose desired state won't be adjusted) or a down node (where actual state won't reflect desired state) can block the network deletion. There may be some corner cases where deleting the network while a task is in the process of shutting down prevents the allocator from deallocating that task, but on balance this seems better than allowing many situations to block network deletion indefinitely. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Codecov Report
@@ Coverage Diff @@
## master #2018 +/- ##
==========================================
+ Coverage 53.66% 53.73% +0.07%
==========================================
Files 109 109
Lines 18991 18978 -13
==========================================
+ Hits 10191 10198 +7
+ Misses 7578 7549 -29
- Partials 1222 1231 +9 Continue to review full report at Codecov.
|
Looks good to me |
Just to make sure, |
This logic allows the network removal if either:
I did it this way to lean towards being permissive about deletions. If I only relied on If the possibility of removing a network that's being used by a container which is about to shut down is a problem, we can revisit this. My thinking was that allowing some corner cases like this would be better than the alternative of cases where it's impossible to remove the network, say because it's blocked by a task on an unresponsive node. |
Agree, seems reasonable. Also, considering that proving the above theory about a corner case may be harder than actually work to a fix on a different part of the stack, should the issue presents. |
Still looks good to me |
And we assume a race where the a task is being rescheduled is not possible since first we're checking for services, correct? |
Yeah. |
LGTM |
I agree with this change. However I think we need to treat "node down" as a common case, not a corner case like this. We should have a way to cleanly evict tasks from a node and handle state properly in manager. Other components don't need to make assumption on it. |
RemoveNetwork
currently won't allow a network to be removed if any task in the store references it. This is a change relative to Docker 1.12, where only services were checked.This might be a little heavy handed. Some tasks in the store may be long dead. Resource attachment tasks, in particular, must be detached by the API user, and this might be neglected after a failure.
Change
RemoveNetwork
to only fail the network deletion if a task has actual and desired state of "running" or below. This means that neither a failed resource attachment task (whose desired state won't be adjusted) or a down node (where actual state won't reflect desired state) can block the network deletion.There may be some corner cases where deleting the network while a task is in the process of shutting down prevents the allocator from deallocating that task, but on balance this seems better than allowing many situations to block network deletion indefinitely.
cc @alexmavr @dongluochen @aboch