Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No automatic manager shutdown on demotion/removal #1829

Merged
merged 5 commits into from
Jan 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 116 additions & 75 deletions api/objects.pb.go

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions api/objects.proto
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,17 @@ message Node {

// Certificate is the TLS certificate issued for the node, if any.
Certificate certificate = 8 [(gogoproto.nullable) = false];

// Role is the *observed* role for this node. It differs from the
// desired role set in Node.Spec.Role because the role here is only
// updated after the Raft member list has been reconciled with the
// desired role from the spec.
//
// This field represents the current reconciled state. If an action is
// to be performed, first verify the role in the cert. This field only
// shows the privilege level that the CA would currently grant when
// issuing or renewing the node's certificate.
NodeRole role = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that your suggestion of reconciled_role would be more clear here, since you're right that "observed" implies (to me at least) the role of the certificate the node is currently using.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hesitation about this is that the worker is going to act on this field to trigger certificate renewals. "Reconciled role" has meaning to the manager, but not the worker. From the worker's perspective, "Get a new certificate if your role changes" makes sense, but "Get a new certificate if your reconciled role changes" doesn't make as much sense.

What about changing the field under Spec to DesiredRole and leaving this one as Role? I didn't think of that before, but as long as the field number doesn't change, it's fine to rename protobuf fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd make sense too. :) Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the Spec field to DesiredRole, thanks.

}

message Service {
Expand Down
232 changes: 116 additions & 116 deletions api/specs.pb.go

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions api/specs.proto
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ message NodeSpec {
DRAIN = 2 [(gogoproto.enumvalue_customname) = "NodeAvailabilityDrain"];
}

// Role defines the role the node should have.
NodeRole role = 2;
// DesiredRole defines the role the node should have.
NodeRole desired_role = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this breaking api change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in gRPC. If we rename the field in the JSON API, it would be. Not sure what we should do about that. We could either keep the current name in JSON, or back out this part of the change and keep Role for gRPC as well. Not sure what's best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, field number and type stay the same, so this won't break anything.


// Membership controls the admission of the node into the cluster.
Membership membership = 3;
Expand Down
7 changes: 4 additions & 3 deletions ca/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,8 @@ func (s *Server) IssueNodeCertificate(ctx context.Context, request *api.IssueNod
// Create a new node
err := s.store.Update(func(tx store.Tx) error {
node := &api.Node{
ID: nodeID,
Role: role,
ID: nodeID,
Certificate: api.Certificate{
CSR: request.CSR,
CN: nodeID,
Expand All @@ -260,7 +261,7 @@ func (s *Server) IssueNodeCertificate(ctx context.Context, request *api.IssueNod
},
},
Spec: api.NodeSpec{
Role: role,
DesiredRole: role,
Membership: api.NodeMembershipAccepted,
Availability: request.Availability,
},
Expand Down Expand Up @@ -318,7 +319,7 @@ func (s *Server) issueRenewCertificate(ctx context.Context, nodeID string, csr [
cert = api.Certificate{
CSR: csr,
CN: node.ID,
Role: node.Spec.Role,
Role: node.Role,
Status: api.IssuanceStatus{
State: api.IssuanceStateRenew,
},
Expand Down
5 changes: 3 additions & 2 deletions ca/testutils/cautils.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,9 +220,10 @@ func createNode(s *store.MemoryStore, nodeID, role string, csr, cert []byte) err
Certificate: cert,
},
Spec: api.NodeSpec{
Role: apiRole,
Membership: api.NodeMembershipAccepted,
DesiredRole: apiRole,
Membership: api.NodeMembershipAccepted,
},
Role: apiRole,
}

return store.CreateNode(tx, node)
Expand Down
4 changes: 2 additions & 2 deletions cmd/swarmctl/node/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,11 @@ func changeNodeRole(cmd *cobra.Command, args []string, role api.NodeRole) error
}
spec := &node.Spec

if spec.Role == role {
if spec.DesiredRole == role {
return errNoChange
}

spec.Role = role
spec.DesiredRole = role

_, err = c.UpdateNode(common.Context(cmd), &api.UpdateNodeRequest{
NodeID: node.ID,
Expand Down
2 changes: 1 addition & 1 deletion cmd/swarmctl/node/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ var (
reachability = reachability + " *"
}
}
if reachability == "" && spec.Role == api.NodeRoleManager {
if reachability == "" && spec.DesiredRole == api.NodeRoleManager {
reachability = "UNKNOWN"
}

Expand Down
2 changes: 1 addition & 1 deletion integration/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ func (c *testCluster) SetNodeRole(id string, role api.NodeRole) error {
return err
}
spec := resp.Node.Spec.Copy()
spec.Role = role
spec.DesiredRole = role
if _, err := c.api.UpdateNode(context.Background(), &api.UpdateNodeRequest{
NodeID: id,
Spec: spec,
Expand Down
14 changes: 7 additions & 7 deletions integration/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,18 @@ func pollClusterReady(t *testing.T, c *testCluster, numWorker, numManager int) {
var leaderFound bool
for _, n := range res.Nodes {
if n.Status.State != api.NodeStatus_READY {
return fmt.Errorf("node %s with role %s isn't ready, status %s, message %s", n.ID, n.Spec.Role, n.Status.State, n.Status.Message)
return fmt.Errorf("node %s with desired role %s isn't ready, status %s, message %s", n.ID, n.Spec.DesiredRole, n.Status.State, n.Status.Message)
}
if n.Spec.Membership != api.NodeMembershipAccepted {
return fmt.Errorf("node %s with role %s isn't accepted to cluster, membership %s", n.ID, n.Spec.Role, n.Spec.Membership)
return fmt.Errorf("node %s with desired role %s isn't accepted to cluster, membership %s", n.ID, n.Spec.DesiredRole, n.Spec.Membership)
}
if n.Certificate.Role != n.Spec.Role {
return fmt.Errorf("node %s had different roles in spec and certificate, %s and %s respectively", n.ID, n.Spec.Role, n.Certificate.Role)
if n.Certificate.Role != n.Spec.DesiredRole {
return fmt.Errorf("node %s had different roles in spec and certificate, %s and %s respectively", n.ID, n.Spec.DesiredRole, n.Certificate.Role)
}
if n.Certificate.Status.State != api.IssuanceStateIssued {
return fmt.Errorf("node %s with role %s has no issued certificate, issuance state %s", n.ID, n.Spec.Role, n.Certificate.Status.State)
return fmt.Errorf("node %s with desired role %s has no issued certificate, issuance state %s", n.ID, n.Spec.DesiredRole, n.Certificate.Status.State)
}
if n.Spec.Role == api.NodeRoleManager {
if n.Role == api.NodeRoleManager {
if n.ManagerStatus == nil {
return fmt.Errorf("manager node %s has no ManagerStatus field", n.ID)
}
Expand Down Expand Up @@ -339,7 +339,7 @@ func TestDemoteDownedManager(t *testing.T) {
resp, err := cl.api.GetNode(context.Background(), &api.GetNodeRequest{NodeID: nodeID})
require.NoError(t, err)
spec := resp.Node.Spec.Copy()
spec.Role = api.NodeRoleWorker
spec.DesiredRole = api.NodeRoleWorker

// stop the node, then demote it, and start it back up again so when it comes back up it has to realize
// it's not running anymore
Expand Down
2 changes: 1 addition & 1 deletion manager/constraint/constraint.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ func NodeMatches(constraints []Constraint, n *api.Node) bool {
return false
}
case strings.EqualFold(constraint.key, "node.role"):
if !constraint.Match(n.Spec.Role.String()) {
if !constraint.Match(n.Role.String()) {
return false
}
case strings.EqualFold(constraint.key, "node.platform.os"):
Expand Down
19 changes: 3 additions & 16 deletions manager/controlapi/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ func (s *Server) ListNodes(ctx context.Context, request *api.ListNodesRequest) (
return true
}
for _, c := range request.Filters.Roles {
if c == e.Spec.Role {
if c == e.Role {
return true
}
}
Expand Down Expand Up @@ -205,7 +205,6 @@ func (s *Server) UpdateNode(ctx context.Context, request *api.UpdateNodeRequest)
var (
node *api.Node
member *membership.Member
demote bool
)

err := s.store.Update(func(tx store.Tx) error {
Expand All @@ -215,9 +214,7 @@ func (s *Server) UpdateNode(ctx context.Context, request *api.UpdateNodeRequest)
}

// Demotion sanity checks.
if node.Spec.Role == api.NodeRoleManager && request.Spec.Role == api.NodeRoleWorker {
demote = true

if node.Spec.DesiredRole == api.NodeRoleManager && request.Spec.DesiredRole == api.NodeRoleWorker {
// Check for manager entries in Store.
managers, err := store.FindNodes(tx, store.ByRole(api.NodeRoleManager))
if err != nil {
Expand Down Expand Up @@ -246,16 +243,6 @@ func (s *Server) UpdateNode(ctx context.Context, request *api.UpdateNodeRequest)
return nil, err
}

if demote && s.raft != nil {
// TODO(abronan): the remove can potentially fail and leave the node with
// an incorrect role (worker rather than manager), we need to reconcile the
// memberlist with the desired state rather than attempting to remove the
// member once.
if err := s.raft.RemoveMember(ctx, member.RaftID); err != nil {
return nil, grpc.Errorf(codes.Internal, "cannot demote manager to worker: %v", err)
}
}

return &api.UpdateNodeResponse{
Node: node,
}, nil
Expand All @@ -276,7 +263,7 @@ func (s *Server) RemoveNode(ctx context.Context, request *api.RemoveNodeRequest)
if node == nil {
return grpc.Errorf(codes.NotFound, "node %s not found", request.NodeID)
}
if node.Spec.Role == api.NodeRoleManager {
if node.Spec.DesiredRole == api.NodeRoleManager {
if s.raft == nil {
return grpc.Errorf(codes.FailedPrecondition, "node %s is a manager but cannot access node information from the raft memberlist", request.NodeID)
}
Expand Down
38 changes: 25 additions & 13 deletions manager/controlapi/node_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ func createNode(t *testing.T, ts *testServer, id string, role api.NodeRole, memb
node := &api.Node{
ID: id,
Spec: api.NodeSpec{
Role: role,
Membership: membership,
},
Status: api.NodeStatus{
State: state,
},
Role: role,
}
err := ts.Store.Update(func(tx store.Tx) error {
return store.CreateNode(tx, node)
Expand Down Expand Up @@ -465,9 +465,9 @@ func TestUpdateNode(t *testing.T) {
assert.NoError(t, store.CreateNode(tx, &api.Node{
ID: nodes[1].SecurityConfig.ClientTLSCreds.NodeID(),
Spec: api.NodeSpec{
Role: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
},
Role: api.NodeRoleManager,
}))
return nil
}))
Expand Down Expand Up @@ -543,23 +543,26 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
assert.NoError(t, store.CreateNode(tx, &api.Node{
ID: nodes[1].SecurityConfig.ClientTLSCreds.NodeID(),
Spec: api.NodeSpec{
Role: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
DesiredRole: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
},
Role: api.NodeRoleManager,
}))
assert.NoError(t, store.CreateNode(tx, &api.Node{
ID: nodes[2].SecurityConfig.ClientTLSCreds.NodeID(),
Spec: api.NodeSpec{
Role: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
DesiredRole: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
},
Role: api.NodeRoleManager,
}))
assert.NoError(t, store.CreateNode(tx, &api.Node{
ID: nodes[3].SecurityConfig.ClientTLSCreds.NodeID(),
Spec: api.NodeSpec{
Role: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
DesiredRole: api.NodeRoleManager,
Membership: api.NodeMembershipAccepted,
},
Role: api.NodeRoleManager,
}))
return nil
}))
Expand All @@ -584,7 +587,7 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
r, err := ts.Client.GetNode(context.Background(), &api.GetNodeRequest{NodeID: nodes[2].SecurityConfig.ClientTLSCreds.NodeID()})
assert.NoError(t, err)
spec := r.Node.Spec.Copy()
spec.Role = api.NodeRoleWorker
spec.DesiredRole = api.NodeRoleWorker
version := &r.Node.Meta.Version
_, err = ts.Client.UpdateNode(context.Background(), &api.UpdateNodeRequest{
NodeID: nodes[2].SecurityConfig.ClientTLSCreds.NodeID(),
Expand All @@ -610,11 +613,14 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
return nil
}))

raftMember := ts.Server.raft.GetMemberByNodeID(nodes[3].SecurityConfig.ClientTLSCreds.NodeID())
assert.NotNil(t, raftMember)

// Try to demote Node 3, this should succeed
r, err = ts.Client.GetNode(context.Background(), &api.GetNodeRequest{NodeID: nodes[3].SecurityConfig.ClientTLSCreds.NodeID()})
assert.NoError(t, err)
spec = r.Node.Spec.Copy()
spec.Role = api.NodeRoleWorker
spec.DesiredRole = api.NodeRoleWorker
version = &r.Node.Meta.Version
_, err = ts.Client.UpdateNode(context.Background(), &api.UpdateNodeRequest{
NodeID: nodes[3].SecurityConfig.ClientTLSCreds.NodeID(),
Expand All @@ -628,6 +634,8 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
2: nodes[2],
}

ts.Server.raft.RemoveMember(context.Background(), raftMember.RaftID)

raftutils.WaitForCluster(t, clockSource, newCluster)

// Server should list 2 members
Expand All @@ -648,11 +656,14 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
lastNode = nodes[1]
}

raftMember = ts.Server.raft.GetMemberByNodeID(demoteNode.SecurityConfig.ClientTLSCreds.NodeID())
assert.NotNil(t, raftMember)

// Try to demote a Node and scale down to 1
r, err = ts.Client.GetNode(context.Background(), &api.GetNodeRequest{NodeID: demoteNode.SecurityConfig.ClientTLSCreds.NodeID()})
assert.NoError(t, err)
spec = r.Node.Spec.Copy()
spec.Role = api.NodeRoleWorker
spec.DesiredRole = api.NodeRoleWorker
version = &r.Node.Meta.Version
_, err = ts.Client.UpdateNode(context.Background(), &api.UpdateNodeRequest{
NodeID: demoteNode.SecurityConfig.ClientTLSCreds.NodeID(),
Expand All @@ -661,6 +672,8 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
})
assert.NoError(t, err)

ts.Server.raft.RemoveMember(context.Background(), raftMember.RaftID)

// Update the server
ts.Server.raft = lastNode.Node
ts.Server.store = lastNode.MemoryStore()
Expand All @@ -683,7 +696,7 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
r, err = ts.Client.GetNode(context.Background(), &api.GetNodeRequest{NodeID: lastNode.SecurityConfig.ClientTLSCreds.NodeID()})
assert.NoError(t, err)
spec = r.Node.Spec.Copy()
spec.Role = api.NodeRoleWorker
spec.DesiredRole = api.NodeRoleWorker
version = &r.Node.Meta.Version
_, err = ts.Client.UpdateNode(context.Background(), &api.UpdateNodeRequest{
NodeID: lastNode.SecurityConfig.ClientTLSCreds.NodeID(),
Expand All @@ -710,7 +723,6 @@ func testUpdateNodeDemote(leader bool, t *testing.T) {
r, err = ts.Client.GetNode(context.Background(), &api.GetNodeRequest{NodeID: lastNode.SecurityConfig.ClientTLSCreds.NodeID()})
assert.NoError(t, err)
assert.Equal(t, r.Node.Spec.Availability, api.NodeAvailabilityDrain)

}

func TestUpdateNodeDemote(t *testing.T) {
Expand Down
Loading