Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start secondary etcd nodes if initial cluster member is offline #4746

Closed
1 task done
brandond opened this issue Dec 14, 2021 · 2 comments
Closed
1 task done
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Dec 14, 2021

Environmental Info:
K3s Version:
v1.21.7-rc2+k3s2

Node(s) CPU architecture, OS, and Version:
N/A

Cluster Configuration:
3 server managed etcd

Describe the bug:
Can't restart cluster nodes without the initial node being up

Steps To Reproduce:

  1. Start 3 k3s servers
  2. Wait 1-2 minutes for servers to finish initializing, then stop all nodes.
  3. Start nodes 2 and 3 - note that they fail because the first node is not available:
Dec 15 00:29:51 ubuntu02.lan.khaus k3s[4614]: time="2021-12-15T00:29:51.457216141Z" level=info msg="Starting k3s v1.21.7-rc2+k3s2 (5260e4a6)"
Dec 15 00:29:51 ubuntu02.lan.khaus k3s[4614]: time="2021-12-15T00:29:51.460691299Z" level=info msg="Joining etcd cluster already initialized, forcing reconciliation"
Dec 15 00:29:51 ubuntu02.lan.khaus k3s[4614]: time="2021-12-15T00:29:51.463292745Z" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://ubuntu01.lan.khaus:6443/cacerts\": dial tcp 10.0.1.147:6443: connect: connection refused"

Expected behavior:
Once bootstrapped, additional nodes are able to start without needing to talk to the first server.

Actual behavior:
Servers depend on the first server being online and available in order to start.

Additional context / logs:

Backporting

  • Needs backporting to older releases
@brandond
Copy link
Member Author

brandond commented Dec 15, 2021

Bisected this to the changes at 1055837#diff-4dcc46ade478259007245d87cf4eedb63ec9ebf0945746803e724cf1e8ee21a2R197-R198

This change forces the node to attempt to re-bootstrap from the 1st server, even if the node is already initialized. We need to ensure that already bootstrapped nodes have NO dependencies on other nodes when starting up.

@mdrahman-suse
Copy link

Verified in k3s on Release 1.23 using v1.23.1-rc1+k3s2 by running the steps mentioned and observed the behavior as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants