Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.21] Unable to start secondary etcd nodes if initial cluster member is offline #4752

Closed
dereknola opened this issue Dec 15, 2021 · 2 comments
Assignees
Milestone

Comments

@dereknola
Copy link
Member

Backport #4746 to release-1.21

@mdrahman-suse
Copy link

Validated in k3s with RC v1.21.8-rc1+k3s1 and observed that in a 3 node cluster, server 2 and server 3 successfully restated after being stopped while server 1 was started last

Steps:

  • Install k3s on a 3 node cluster
$ kubectl get nodes,pods -A -o wide
NAME                    STATUS   ROLES                       AGE   VERSION            INTERNAL-IP     EXTERNAL-IP     OS-IMAGE           KERNEL-VERSION   CONTAINER-RUNTIME
node/server2   Ready    control-plane,etcd,master   39m   v1.21.8-rc1+k3s1   <REDACTED>   <REDACTED>   Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/agent    Ready    <none>                      38m   v1.21.8-rc1+k3s1   <REDACTED>    <REDACTED>     Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/server1    Ready    control-plane,etcd,master   42m   v1.21.8-rc1+k3s1   <REDACTED>    <REDACTED>   Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/server3     Ready    control-plane,etcd,master   40m   v1.21.8-rc1+k3s1   <REDACTED>     <REDACTED>     Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
  • Stop all the servers once initialized
  • Start server 2 and server 3
  • Verify they are up and running and no major error displayed in the log
$ kubectl get nodes,pods -A -o wide
NAME                    STATUS     ROLES                       AGE   VERSION            INTERNAL-IP     EXTERNAL-IP     OS-IMAGE           KERNEL-VERSION   CONTAINER-RUNTIME
node/server2   Ready      control-plane,etcd,master   62m   v1.21.8-rc1+k3s1   172.31.12.106   3.138.122.137   Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/agent    Ready      <none>                      61m   v1.21.8-rc1+k3s1   172.31.15.46    18.117.9.16     Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/server1    NotReady   control-plane,etcd,master   65m   v1.21.8-rc1+k3s1   172.31.2.134    18.221.237.67   Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
node/server3     Ready      control-plane,etcd,master   63m   v1.21.8-rc1+k3s1   172.31.7.22     3.144.13.41     Ubuntu 20.04 LTS   5.4.0-1009-aws   containerd://1.4.12-k3s1
  • Start server 1
  • Verify cluster is up and running
  • Deploy workload and validate

NOTE: When tested with RC v1.21.8-rc1+k3s1 it was observed that upon restarting only server 2 while server 1 and server 3 is stopped, server 2 is showing the error

$ kubectl get nodes,pods -A -o wide
Error from server (InternalError): an error on the server ("apiserver not ready") has prevented the request from succeeding (get nodes)
Error from server (InternalError): an error on the server ("apiserver not ready") has prevented the request from succeeding (get pods)

Although I see that k3s is running on server 2

$ sudo systemctl status k3s
k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2021-12-17 07:34:44 UTC; 22s ago
       Docs: https://k3s.io

Please advice if this is expected @dereknola
CC: @rancher-max @ShylajaDevadiga

@rancher-max
Copy link
Contributor

Awesome! That looks like correct behavior. The behavior mentioned above when just starting one node is due to quorum loss in etcd, so only starting one node does not restore quorum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants