You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All commit ids: 9a4ca5978b1cf6ab27f983097edbcd17df73bbf8 on release-1.22 d413f971463a34d0191d1e67f6913e161a037589 on release-1.21 ab3d25a2c5f479f77c9579af719506438f5d4fe2 on master
Node(s) CPU architecture, OS, and Version:
Ubuntu 20.04 LTS
Cluster Configuration:
3 servers, all joining at the same time
Describe the bug:
Joining 3 servers with etcd backend at the same time causes one to fail with no recovery method other than uninstalling and reinstalling.
Steps To Reproduce:
config.yamls:
# server1:
cluster-init: true
token: test
# server2 & 3:
token: test
server: https://<server1 ip>:6443
Supply config.yamls shown above, and then at as close to the same time as possible, run the following on all 3 servers: curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=server INSTALL_K3S_COMMIT=ab3d25a2c5f479f77c9579af719506438f5d4fe2 sh -
Expected behavior:
These should all join successfully after some time, maybe after a few looping logs in one of the servers like: level=fatal msg="ETCD join failed: etcdserver: too many learner members in cluster"
Actual behavior:
These fail to join, with the following looping logs that never stop:
Oct 27 01:43:24 ip-172-31-27-228 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 28.
Oct 27 01:43:24 ip-172-31-27-228 systemd[1]: Stopped Lightweight Kubernetes.
Oct 27 01:43:25 ip-172-31-27-228 systemd[1]: Starting Lightweight Kubernetes...
Oct 27 01:43:25 ip-172-31-27-228 sh[6517]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Oct 27 01:43:25 ip-172-31-27-228 sh[6522]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Oct 27 01:43:25 ip-172-31-27-228 systemd[1]: Started Lightweight Kubernetes.
Oct 27 01:43:25 ip-172-31-27-228 k3s[6541]: time="2021-10-27T01:43:25Z" level=info msg="Starting k3s v1.22.2+k3s-ab3d25a2 (ab3d25a2)"
Oct 27 01:43:25 ip-172-31-27-228 k3s[6541]: time="2021-10-27T01:43:25Z" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Oct 27 01:43:25 ip-172-31-27-228 k3s[6541]: time="2021-10-27T01:43:25Z" level=info msg="Managed etcd cluster not yet initialized"
Oct 27 01:43:25 ip-172-31-27-228 k3s[6541]: time="2021-10-27T01:43:25Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Oct 27 01:43:25 ip-172-31-27-228 k3s[6541]: time="2021-10-27T01:43:25Z" level=fatal msg="starting kubernetes: preparing server: etcdclient: no available endpoints"
Oct 27 01:43:25 ip-172-31-27-228 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Oct 27 01:43:25 ip-172-31-27-228 systemd[1]: k3s.service: Failed with result 'exit-code'.
Additional context / logs:
The only workaround I can find is by uninstalling k3s on the affected node and reinstalling.
Backporting
Needs backporting to older releases
The text was updated successfully, but these errors were encountered:
This has been validated on master branch using commitid 8271d98a766b060463bc73ef66c5085b5797b4cc following the same steps as mentioned in the issue. I should still validate on release-1.21 and release-1.22 branches for backports before closing.
Environmental Info:
K3s Version:
All commit ids:
9a4ca5978b1cf6ab27f983097edbcd17df73bbf8
on release-1.22d413f971463a34d0191d1e67f6913e161a037589
on release-1.21ab3d25a2c5f479f77c9579af719506438f5d4fe2
on masterNode(s) CPU architecture, OS, and Version:
Ubuntu 20.04 LTS
Cluster Configuration:
3 servers, all joining at the same time
Describe the bug:
Joining 3 servers with etcd backend at the same time causes one to fail with no recovery method other than uninstalling and reinstalling.
Steps To Reproduce:
config.yamls:
Supply config.yamls shown above, and then at as close to the same time as possible, run the following on all 3 servers:
curl -sfL https://get.k3s.io | INSTALL_K3S_TYPE=server INSTALL_K3S_COMMIT=ab3d25a2c5f479f77c9579af719506438f5d4fe2 sh -
Expected behavior:
These should all join successfully after some time, maybe after a few looping logs in one of the servers like:
level=fatal msg="ETCD join failed: etcdserver: too many learner members in cluster"
Actual behavior:
These fail to join, with the following looping logs that never stop:
Additional context / logs:
The only workaround I can find is by uninstalling k3s on the affected node and reinstalling.
Backporting
The text was updated successfully, but these errors were encountered: