-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create global service tasks on nodes that satisfy the constraints #1570
Only create global service tasks on nodes that satisfy the constraints #1570
Conversation
Current coverage is 54.13% (diff: 56.28%)@@ master #1570 diff @@
==========================================
Files 83 82 -1
Lines 13621 13658 +37
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 7360 7394 +34
Misses 5265 5265
- Partials 996 999 +3
|
I wrote a Docker integration test: aaronlehmann/docker@d6bc367 |
And a test covering draining and pausing for global services: aaronlehmann/docker@dd5f3cd |
g.addTask(ctx, batch, service.Service, nodeID) | ||
} else { | ||
// If task is out of date, update it. This can happen | ||
// on node reconciliation if, for example, we drain a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node is drained, global service tasks are removed. This should be covered by if len(tasks) == 0
. Do you mean the node was paused
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was meant to say paused
. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
|
||
if node.Spec.Availability == api.NodeAvailabilityPause { | ||
// the node is paused, so we won't add or update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this move before if _, exists := nodeCompleted[nodeID]; exists || !meetsConstraints {
? I think paused
is a state that should block reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paused means it doesn't accept new tasks, so I think a paused node should still kill the task if the service is deleted. That's how it works for replicated services.
This code was formerly part of the scheduler package. It needs to be moved to an independent package so the orchestrator can use it when considering where to create tasks for global services. While moving it, made a few minor cleanups: - Renamed Expr to Constraint - Unexport Key field - Renamed ParseExprs to Parse Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
…the constraints Previously, the global orchestrator would always create a task for every node, and the scheduler would figure out which ones could proceed into a running state. This was inefficient, confusing to users, and caused problems with rolling updates (updates would get stuck if they encountered tasks that couldn't start due to constraints). Change the global orchestrator to only create tasks on nodes that meet the constraints. The scheduler still handles resource reservations. This change also fixes a few problems I found in the global orchestrator: - Global service updates shouldn't apply to paused nodes. Since the node can't accept new tasks, trying to update the service just prevents it from running on that node. - As a consequence, node reconcilation needs to make sure existing tasks match the current service spec. For example, you could pause a node, update the service, and then later activate the node. - Drained nodes weren't being reconciled on orchestrator startup, so a leftover task on one of these nodes would never be shut down. - The Batch callback in service reconciliation would wrongly terminate early if it encountered a task that had completed. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This can avoid doing extra raft writes, and reading extra data from the store. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
a3bff52
to
bc7b79e
Compare
LGTM |
@aluzzardi: Do you want to review this one? Should I go ahead and merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Minor nit
@@ -0,0 +1,164 @@ | |||
package constraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shouldn't this be a subpackage of scheduler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that make sense, since now orchestrator
uses it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a moot point. To me it felt like it was part of the scheduler (which at the end of the day is the component responsible to place tasks onto nodes), and it turns out another component (global orchestrator) wants to use a piece of the scheduler. I don't think the scheduler and orchestrator are sharing a common piece, more like the orchestrator wants to use the scheduler for some stuff.
But it's a minor personal preference, no objective arguments
Great work. However I have a question. Here is the description:
@aaronlehmann |
@allencloud: Currently, this isn't handled. There is an issue covering it: #1009 When we fix #1009, we will have to make sure that the orchestrator handles these cases (for global services), not just the scheduler. |
Actually, I believe the particular case you're talking about should be handled correctly because of this PR. When a node is updated (like having its role changed to But #1009 is still a problem for replicated services. |
Did this not make it into 1.12.2? I tested like so:
So if I understand correctly this service should only be allocated for test2, right? |
This is for 1.13, sorry. |
Thanks for your detailed explanation. @aaronlehmann |
Previously, the global orchestrator would always create a task for every node, and the scheduler would figure out which ones could proceed into a running state. This was inefficient, confusing to users, and caused problems with rolling updates (updates would get stuck if they encountered tasks that couldn't start due to constraints). Change the global orchestrator to only create tasks on nodes that meet the constraints. The scheduler still handles resource reservations.
This change also fixes a few problems I found in the global orchestrator:
The first commit moves constraint parsing and evaluation to its own package so the orchestrator can use it as well. It makes a few cleanups to the method naming in this code.
The second commit does the actual change to the global orchestrator.
The third commit is an optimization that changes
reconcileOneService
andreconcileServiceOneNode
to handle more than one service at a time. This avoids having a raft round trip for each service, and also avoids some redundant queries to the memory store. This should not cause a behavior difference, though, and could be split into a separate PR. I think it's easier to review/test all of this at once, though.I've run the Docker integration tests against this and I've tested this a bit with the Docker CLI. I'm working on an integration test to cover global services that use constraints.
Addresses moby/moby#26325
cc @dongluochen @aluzzardi