Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Master Failed : FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration #5227

Closed
shrutishete opened this issue Sep 30, 2019 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@shrutishete
Copy link

Environment:

  • Cloud provider or hardware configuration:

  • **OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): "Ubuntu 16.04.3 LTS"

  • Version of Ansible (ansible --version): ansible 2.7.12

**Kubespray version (commit) (git rev-parse --short HEAD): 8712bdd

Network plugin used: calico

Copy of your inventory file:

all]
master-1 ansible_host=161.92.248.32 ip=161.92.248.32 ansible_user=philips ansible_sudo=yes
worker-1 ansible_host=161.92.248.33 ip=161.92.248.33 ansible_user=philips ansible_sudo=yes

[kube-master]
master-1

[kube-node]
worker-1

[etcd]
master-1

[k8s-cluster:children]
kube-master
kube-node

Command used to invoke ansible:
ansible-playbook -b --ask-become-pass --become-user=root -i inventory/mycluster/inventory.ini cluster.yml

Output of ansible run:

TASK [kubernetes/master : Create kubeadm token for joining nodes with 24h expiration (default)] **********************************************************************************************
Monday 30 September 2019 16:39:05 +0530 (0:00:00.100) 0:03:49.003 ******
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (5 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (4 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (3 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (2 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (1 retries left).
fatal: [master-1 -> 161.92.248.32]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["/opt/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token", "create"], "delta": "0:01:15.204831", "end": "2019-09-30 16:47:09.210040", "msg": "non-zero return code", "rc": 1, "start": "2019-09-30 16:45:54.005209", "stderr": "timed out waiting for the condition", "stderr_lines": ["timed out waiting for the condition"], "stdout": "", "stdout_lines": []}

@shrutishete shrutishete added the kind/bug Categorizes issue or PR as related to a bug. label Sep 30, 2019
@shrutishete
Copy link
Author

On running this /opt/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create
getting the following output :

I0930 18:00:52.823950 13595 token.go:115] [token] validating mixed arguments
I0930 18:00:52.823994 13595 token.go:122] [token] getting Clientsets from kubeconfig file
I0930 18:00:52.824900 13595 loader.go:359] Config loaded from file: /etc/kubernetes/admin.conf
I0930 18:00:52.825433 13595 token.go:221] [token] loading configurations
I0930 18:00:52.825648 13595 interface.go:384] Looking for default routes with IPv4 addresses
I0930 18:00:52.825658 13595 interface.go:389] Default route transits interface "ens160"
I0930 18:00:52.825850 13595 interface.go:196] Interface ens160 is up
I0930 18:00:52.825895 13595 interface.go:244] Interface "ens160" has 2 addresses :[161.92.248.32/24 fe80::955e:4706:d886:670f/64].
I0930 18:00:52.825913 13595 interface.go:211] Checking addr 161.92.248.32/24.
I0930 18:00:52.825920 13595 interface.go:218] IP found 161.92.248.32
I0930 18:00:52.825926 13595 interface.go:250] Found valid IPv4 address 161.92.248.32 for interface "ens160".
I0930 18:00:52.825931 13595 interface.go:395] Found active IP 161.92.248.32
I0930 18:00:52.826111 13595 feature_gate.go:216] feature gates: &{map[]}
I0930 18:00:52.826129 13595 token.go:233] [token] creating token
I0930 18:00:52.826187 13595 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubeadm/v1.15.3 (linux/amd64) kubernetes/2d3c76f" 'https://lb-apiserver.kubernetes.local:6443/api/v1/namespaces/kube-system/secrets/bootstrap-token-w27knb'
I0930 18:00:52.843460 13595 round_trippers.go:438] GET https://lb-apiserver.kubernetes.local:6443/api/v1/namespaces/kube-system/secrets/bootstrap-token-w27knb in 17 milliseconds
I0930 18:00:52.843493 13595 round_trippers.go:444] Response Headers:
I0930 18:00:52.843916 13595 request.go:947] Request Body: {"kind":"Secret","apiVersion":"v1","metadata":{"name":"bootstrap-token-w27knb","namespace":"kube-system","creationTimestamp":null},"data":{"auth-extra-groups":"c3lzdGVtOmJvb3RzdHJhcHBlcnM6a3ViZWFkbTpkZWZhdWx0LW5vZGUtdG9rZW4=","expiration":"MjAxOS0xMC0wMVQxODowMDo1MiswNTozMA==","token-id":"dzI3a25i","token-secret":"b2c1eWdtaHAxbHh0aDlzaQ==","usage-bootstrap-authentication":"dHJ1ZQ==","usage-bootstrap-signing":"dHJ1ZQ=="},"type":"bootstrap.kubernetes.io/token"}
I0930 18:00:52.843977 13595 round_trippers.go:419] curl -k -v -XPOST -H "Accept: application/json, /" -H "Content-Type: application/json" -H "User-Agent: kubeadm/v1.15.3 (linux/amd64) kubernetes/2d3c76f" 'https://lb-apiserver.kubernetes.local:6443/api/v1/namespaces/kube-system/secrets'
I0930 18:00:52.867204 13595 round_trippers.go:438] POST https://lb-apiserver.kubernetes.local:6443/api/v1/namespaces/kube-system/secrets in 23 milliseconds

@atooki
Copy link

atooki commented Nov 4, 2019

I'm having this problem as well Ubuntu verstion 18.0.4 LTS

my host.ini file:
[all]
node01 ansible_host=10.44.16.3 ip=10.44.16.3
node02 ansible_host=10.44.16.1 ip=10.44.16.1
node03 ansible_host=10.44.16.2 ip=10.44.16.2
node04 ansible_host=10.44.16.4 ip=10.44.16.4

[kube-master]
node01

[etcd]
node01
node02
node03

[kube-node]
node02
node03
node04

[calico-rr]

[k8s-cluster:children]
kube-master
kube-node

@RiaanLab
Copy link

RiaanLab commented Dec 8, 2019

+1

@samchal
Copy link

samchal commented Feb 11, 2020

I've also seen this problem with the kubeadm token occur after upgrading the underlying OS to Ubuntu 18.04.4 and adding a new node using the scale.yml playbook (using kubespray 2.11.0)

Looking at the logs on the nodes using journalctl -u kubelet on any of the nodes (including the master) it appears that there was a mismatch between the cgroup driver being using by docker and kubelet, preventing kubelet from starting up correctly:

Feb 10 22:23:04 lnx-node1 kubelet[4244]: F0210 22:23:04.280596 4244 server.go:273] failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

I fixed this by setting cgroupDriver: cgroupfs in /etc/kubernetes/kubelet-config.yaml on each cluster node and restarting the cluster. All the nodes including the master then start up OK after being rebooted. Re-running the scale playbook then works, but you need to repeat the change to set cgroupDriver: cgroupfs in /etc/kubernetes/kubelet-config.yaml after running the playbook. An ansible task for this is as follows:

    - name: Modify cgroupdriver
      become: true
      lineinfile:
        dest: /etc/kubernetes/kubelet-config.yaml
        regexp: '^cgroupDriver:'
        line: 'cgroupDriver: cgroupfs'
        state: present

Possible root cause: It appears that the docker info on Ubuntu 18.04.4 appends the message WARNING: no swap limit support, which I suspect may not be handled by kubespray when detecting the cgroup driver (see playbook roles/kubernetes/node/tasks/facts.yml).

The warning message can be resolved by updating the kernel command line boot options and updating grub as described the end of the article at https://docs.docker.com/install/linux/linux-postinstall/ under the section on Your kernel does not support cgroup swap limit capabilities.

I haven't tried re-running the playbook to confirm if this fixes the issue as manually setting the cgroupDriver in /etc/kubernetes/kubelet-config.yaml as described above worked OK for me.

@KouriR
Copy link

KouriR commented Feb 25, 2020

We also ran into the kubelet cgroup driver issue on Ubuntu 18 and Kubespray 2.11.0, and the fix samchal mentioned did work for us, but that was unrelated to the kubeadm token issue.

In our case, we were trying to reconfigure the cluster with an apiserver_loadbalancer_domain_name, and kubeadm --kubeconfig /etc/kubernetes/admin.conf was timing out because admin.conf was trying to use the new apiserver_loadbalancer_domain_name value, which was not yet in the apiserver cert's SAN list, so the validation failed.
The fix for this issue is documented here: kubernetes/kubeadm#1447 (comment)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 24, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@GSalah
Copy link

GSalah commented Oct 15, 2020

Don't run the ansible-playbook from a master node use another separate VM to manipulate your cluster

@ognjen-it
Copy link

I resolved it when I did:

ansible-playbook -i inventory/mycluster/inventory.ini --become --user=root --become-user=root reset.yml -e ansible_python_interpreter=/usr/bin/python3
and then:
ansible-playbook -i inventory/mycluster/inventory.ini --become --user=root --become-user=root cluster.yml -e ansible_python_interpreter=/usr/bin/python3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

9 participants