Only create global service tasks on nodes that satisfy the constraints #1570

aaronlehmann · 2016-09-26T17:18:21Z

Previously, the global orchestrator would always create a task for every node, and the scheduler would figure out which ones could proceed into a running state. This was inefficient, confusing to users, and caused problems with rolling updates (updates would get stuck if they encountered tasks that couldn't start due to constraints). Change the global orchestrator to only create tasks on nodes that meet the constraints. The scheduler still handles resource reservations.

This change also fixes a few problems I found in the global orchestrator:

Global service updates shouldn't apply to paused nodes. Since the node can't accept new tasks, trying to update the service just prevents it from running on that node.
As a consequence, node reconcilation needs to make sure existing tasks match the current service spec. For example, you could pause a node, update the service, and then later activate the node.
Drained nodes weren't being reconciled on orchestrator startup, so a leftover task on one of these nodes would never be shut down.
The Batch callback in service reconciliation would wrongly terminate early if it encountered a task that had completed.

The first commit moves constraint parsing and evaluation to its own package so the orchestrator can use it as well. It makes a few cleanups to the method naming in this code.

The second commit does the actual change to the global orchestrator.

The third commit is an optimization that changes reconcileOneService and reconcileServiceOneNode to handle more than one service at a time. This avoids having a raft round trip for each service, and also avoids some redundant queries to the memory store. This should not cause a behavior difference, though, and could be split into a separate PR. I think it's easier to review/test all of this at once, though.

I've run the Docker integration tests against this and I've tested this a bit with the Docker CLI. I'm working on an integration test to cover global services that use constraints.

Addresses moby/moby#26325

cc @dongluochen @aluzzardi

codecov-io · 2016-09-26T17:25:38Z

Current coverage is 54.13% (diff: 56.28%)

Merging #1570 into master will increase coverage by 0.10%

@@             master      #1570   diff @@
==========================================
  Files            83         82     -1   
  Lines         13621      13658    +37   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7360       7394    +34   
  Misses         5265       5265          
- Partials        996        999     +3

Powered by Codecov. Last update a126286...bc7b79e

aaronlehmann · 2016-09-26T18:48:02Z

I wrote a Docker integration test: aaronlehmann/docker@d6bc367

aaronlehmann · 2016-09-27T10:46:02Z

And a test covering draining and pausing for global services: aaronlehmann/docker@dd5f3cd

dongluochen · 2016-09-27T18:49:35Z

manager/orchestrator/global.go

+			g.addTask(ctx, batch, service.Service, nodeID)
+		} else {
+			// If task is out of date, update it. This can happen
+			// on node reconciliation if, for example, we drain a


If the node is drained, global service tasks are removed. This should be covered by if len(tasks) == 0. Do you mean the node was paused?

Yes, this was meant to say paused. I'll fix it.

dongluochen · 2016-09-27T19:04:18Z

manager/orchestrator/global.go

 			}
+
+			if node.Spec.Availability == api.NodeAvailabilityPause {
+				// the node is paused, so we won't add or update


Should this move before if _, exists := nodeCompleted[nodeID]; exists || !meetsConstraints {? I think paused is a state that should block reconciliation.

Paused means it doesn't accept new tasks, so I think a paused node should still kill the task if the service is deleted. That's how it works for replicated services.

This code was formerly part of the scheduler package. It needs to be moved to an independent package so the orchestrator can use it when considering where to create tasks for global services. While moving it, made a few minor cleanups: - Renamed Expr to Constraint - Unexport Key field - Renamed ParseExprs to Parse Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

…the constraints Previously, the global orchestrator would always create a task for every node, and the scheduler would figure out which ones could proceed into a running state. This was inefficient, confusing to users, and caused problems with rolling updates (updates would get stuck if they encountered tasks that couldn't start due to constraints). Change the global orchestrator to only create tasks on nodes that meet the constraints. The scheduler still handles resource reservations. This change also fixes a few problems I found in the global orchestrator: - Global service updates shouldn't apply to paused nodes. Since the node can't accept new tasks, trying to update the service just prevents it from running on that node. - As a consequence, node reconcilation needs to make sure existing tasks match the current service spec. For example, you could pause a node, update the service, and then later activate the node. - Drained nodes weren't being reconciled on orchestrator startup, so a leftover task on one of these nodes would never be shut down. - The Batch callback in service reconciliation would wrongly terminate early if it encountered a task that had completed. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

This can avoid doing extra raft writes, and reading extra data from the store. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

dongluochen · 2016-09-29T01:48:00Z

LGTM

aaronlehmann · 2016-09-29T09:01:39Z

@aluzzardi: Do you want to review this one? Should I go ahead and merge it?

aluzzardi

LGTM

Minor nit

aluzzardi · 2016-09-29T21:46:17Z

manager/constraint/constraint.go

@@ -0,0 +1,164 @@
+package constraint


nit: Shouldn't this be a subpackage of scheduler?

Would that make sense, since now orchestrator uses it as well?

Yeah, it's a moot point. To me it felt like it was part of the scheduler (which at the end of the day is the component responsible to place tasks onto nodes), and it turns out another component (global orchestrator) wants to use a piece of the scheduler. I don't think the scheduler and orchestrator are sharing a common piece, more like the orchestrator wants to use the scheduler for some stuff.

But it's a minor personal preference, no objective arguments

allencloud · 2016-10-10T10:11:03Z

Great work. However I have a question. Here is the description:

a cluster with 5 nodes, 2 managers, 3 workers;
create a global service with constraints only running on managers: docker service create --mode global --constraint node.role==manager ubuntu:14.04 sleep 100000; it will create a service with two tasks
if a promote a worker node to manager node, will it work that a task newly created on the promoted node?

@aaronlehmann
Thanks

aaronlehmann · 2016-10-10T18:14:49Z

@allencloud: Currently, this isn't handled. There is an issue covering it: #1009

When we fix #1009, we will have to make sure that the orchestrator handles these cases (for global services), not just the scheduler.

aaronlehmann · 2016-10-10T18:24:09Z

Actually, I believe the particular case you're talking about should be handled correctly because of this PR. When a node is updated (like having its role changed to manager), the orchestrator will reconcile the global services running on that node. If it meets a constaint that it didn't meet before, a task will be created for that node. If it no longer meets a constraint that it met before the update, its task will be removed.

But #1009 is still a problem for replicated services.

colinmollenhour · 2016-10-13T01:27:55Z

Did this not make it into 1.12.2? I tested like so:

# docker service create --mode global --constraint "node.foo==bar" --name foo busybox sleep 100000
# docker service ps foo
ID                         NAME     IMAGE    NODE   DESIRED STATE  CURRENT STATE             ERROR
9e2w5ymlj6s6758n6fo4kgmn5  foo      busybox  test1  Running        Allocated 13 seconds ago
2xg2w0ewde3jtlwrmgjl75oyv   \_ foo  busybox  test4  Running        Allocated 13 seconds ago
9cy73sxyb2ps9aqp2pedjlc1l   \_ foo  busybox  test2  Running        Allocated 13 seconds ago
agyleclrfwigvz1wib17s1el2   \_ foo  busybox  test5  Running        Allocated 13 seconds ago
cn7fixwz6mzwelhgdxp4fr6no   \_ foo  busybox  test3  Running        Allocated 13 seconds ago
# docker node inspect --format {{.Spec.Labels}} test1 test2 test3 test4 test5
map[]
map[foo:bar]
map[role:cache]
map[]
map[]

So if I understand correctly this service should only be allocated for test2, right?

aaronlehmann · 2016-10-13T01:30:36Z

This is for 1.13, sorry.

allencloud · 2016-10-13T02:17:24Z

Thanks for your detailed explanation. @aaronlehmann
🐼

GordonTheTurtle added the status/0-triage label Sep 26, 2016

aluzzardi mentioned this pull request Sep 26, 2016

create only one task on every manager, is using global and constraint a 'undefined behavior' ? moby/moby#26903

Closed

dongluochen reviewed Sep 27, 2016

View reviewed changes

aaronlehmann added 3 commits September 27, 2016 22:08

global orchestrator: reconcile more than one service at a time

bc7b79e

This can avoid doing extra raft writes, and reading extra data from the store. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann force-pushed the global-service-orchestration branch from a3bff52 to bc7b79e Compare September 27, 2016 21:08

aluzzardi approved these changes Sep 29, 2016

View reviewed changes

aaronlehmann merged commit 0ff1041 into moby:master Oct 6, 2016

aaronlehmann deleted the global-service-orchestration branch October 6, 2016 14:33

aaronlehmann mentioned this pull request Oct 10, 2016

Global Servives should honor constraints #887

Closed

allencloud mentioned this pull request Oct 26, 2016

support show numbers of global service in service ls command moby/moby#27710

Merged

BretFisher mentioned this pull request Oct 27, 2016

docker swarm does not update images when it is a mode global moby/moby#27811

Closed

davemuench mentioned this pull request Oct 29, 2016

Can't get a response from the routed web server tpbowden/swarm-ingress-router#19

Closed

aaronlehmann mentioned this pull request Oct 31, 2016

Global service updates get stuck if a node can't run the task #1720

Open

programmerq mentioned this pull request Jul 18, 2017

Document behavior of constraints on a global mode service docker/docs#3915

Closed

johndmulhausen mentioned this pull request Oct 2, 2017

Document behavior of constraints on a global mode service docker/cli#592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only create global service tasks on nodes that satisfy the constraints #1570

Only create global service tasks on nodes that satisfy the constraints #1570

aaronlehmann commented Sep 26, 2016 •

edited

Loading

codecov-io commented Sep 26, 2016 •

edited

Loading

aaronlehmann commented Sep 26, 2016

aaronlehmann commented Sep 27, 2016

dongluochen Sep 27, 2016 •

edited

Loading

aaronlehmann Sep 27, 2016

aaronlehmann Sep 27, 2016

dongluochen Sep 27, 2016 •

edited

Loading

aaronlehmann Sep 27, 2016

dongluochen commented Sep 29, 2016

aaronlehmann commented Sep 29, 2016

aluzzardi left a comment

aluzzardi Sep 29, 2016

aaronlehmann Sep 29, 2016

aluzzardi Oct 1, 2016

allencloud commented Oct 10, 2016

aaronlehmann commented Oct 10, 2016

aaronlehmann commented Oct 10, 2016

colinmollenhour commented Oct 13, 2016

aaronlehmann commented Oct 13, 2016

allencloud commented Oct 13, 2016

Only create global service tasks on nodes that satisfy the constraints #1570

Only create global service tasks on nodes that satisfy the constraints #1570

Conversation

aaronlehmann commented Sep 26, 2016 • edited Loading

codecov-io commented Sep 26, 2016 • edited Loading

Current coverage is 54.13% (diff: 56.28%)

aaronlehmann commented Sep 26, 2016

aaronlehmann commented Sep 27, 2016

dongluochen Sep 27, 2016 • edited Loading

Choose a reason for hiding this comment

aaronlehmann Sep 27, 2016

Choose a reason for hiding this comment

aaronlehmann Sep 27, 2016

Choose a reason for hiding this comment

dongluochen Sep 27, 2016 • edited Loading

Choose a reason for hiding this comment

aaronlehmann Sep 27, 2016

Choose a reason for hiding this comment

dongluochen commented Sep 29, 2016

aaronlehmann commented Sep 29, 2016

aluzzardi left a comment

Choose a reason for hiding this comment

aluzzardi Sep 29, 2016

Choose a reason for hiding this comment

aaronlehmann Sep 29, 2016

Choose a reason for hiding this comment

aluzzardi Oct 1, 2016

Choose a reason for hiding this comment

allencloud commented Oct 10, 2016

aaronlehmann commented Oct 10, 2016

aaronlehmann commented Oct 10, 2016

colinmollenhour commented Oct 13, 2016

aaronlehmann commented Oct 13, 2016

allencloud commented Oct 13, 2016

aaronlehmann commented Sep 26, 2016 •

edited

Loading

codecov-io commented Sep 26, 2016 •

edited

Loading

dongluochen Sep 27, 2016 •

edited

Loading

dongluochen Sep 27, 2016 •

edited

Loading