Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Edgenode Turns to NotReady, kubelet on It Restart Rapidly #367

Closed
Windrow opened this issue Jun 23, 2021 · 19 comments · Fixed by #370
Closed

[BUG] Edgenode Turns to NotReady, kubelet on It Restart Rapidly #367

Windrow opened this issue Jun 23, 2021 · 19 comments · Fixed by #370
Labels
kind/bug kind/bug

Comments

@Windrow
Copy link
Contributor

Windrow commented Jun 23, 2021

What happened:

After executing _output/local/bin/linux/amd64/yurtctl convert --provider kubeadm --cloud-nodes master on master node of an established Kubernetes cluster of two nodes, all pods are pulled and started as expected, but the status of the edge node turns to NotReady. Check kubelet's log with journalctl -xeu kubelet on that device, we can see kubelet keep restart every few minutes.

What you expected to happen:

After conversion, edge node rejoins the cluster, and has it status Ready.

How to reproduce it (as minimally and precisely as possible): (updated due to further investigation)

  • Setup a Kubernetes cluster (I did it with kubeadm)
  • Execute _output/local/bin/linux/amd64/yurtctl convert --provider kubeadm --cloud-nodes master on master node (called master)
  • Reset yurt with _output/local/bin/linux/amd64/yurtctl convert
  • Reset the cluster (with kubeadm reset)
  • Setup the Kubernetes cluster again with new credential
  • Execute yurtctl convert again
  • Issue happens

Anything else we need to know?:

Environment:

  • OpenYurt version: V0.4.0
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g: cat /etc/os-release):
    Cloud node:
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Edge node:

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a):
    Cloud node:
Linux node-3 5.4.0-74-generic #83~18.04.1-Ubuntu SMP Tue May 11 16:01:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Edge node:

Linux ubuntu 5.4.0-1036-raspi #39-Ubuntu SMP PREEMPT Wed May 12 17:37:51 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
  • Install tools: kubeadm
  • Others:

others
#360 (comment)

/kind bug

@Windrow
Copy link
Contributor Author

Windrow commented Jun 23, 2021

yurthub_yaml

@rambohe-ch
Copy link
Member

@Windrow Thank you for filing issue.
Would you be able to upload the /etc/kubernetes/manifests/yurt-hub.yaml created by openyurt/yurtctl-servant?

@Windrow
Copy link
Contributor Author

Windrow commented Jun 23, 2021

@rambohe-ch

root@ubuntu:/etc/kubernetes# cat manifests/yurt-hub.yaml 

apiVersion: v1
kind: Pod
metadata:
  labels:
    k8s-app: yurt-hub
  name: yurt-hub
  namespace: kube-system
spec:
  volumes:
  - name: hub-dir
    hostPath:
      path: /var/lib/yurthub
      type: DirectoryOrCreate
  - name: kubernetes
    hostPath:
      path: /etc/kubernetes
      type: Directory
  containers:
  - name: yurt-hub
    image: openyurt/yurthub:latest
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: hub-dir
      mountPath: /var/lib/yurthub
    - name: kubernetes
      mountPath: /etc/kubernetes
    command:
    - yurthub
    - --v=2
    - --server-addr=https://192.168.0.97:6443
    - --node-name=$(NODE_NAME)
    - --join-token=ghijkl.0123456789101112
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /v1/healthz
        port: 10267
      initialDelaySeconds: 300
      periodSeconds: 5
      failureThreshold: 3
    resources:
      requests:
        cpu: 150m
        memory: 150Mi
      limits:
        memory: 300Mi
    securityContext:
      capabilities:
        add: ["NET_ADMIN", "NET_RAW"]
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
  hostNetwork: true
  priorityClassName: system-node-critical
  priority: 2000001000

I believe it is created based on common.go.

And here is a comparison between it and the template:
#367 (comment)

@Windrow
Copy link
Contributor Author

Windrow commented Jun 23, 2021

Jun 23 19:33:48 ubuntu kubelet[1110584]: E0623 19:33:48.415434 1110584 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:33:48 ubuntu kubelet[1110584]: E0623 19:33:48.515749 1110584 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:33:48 ubuntu kubelet[1110584]: E0623 19:33:48.578768 1110584 eviction_manager.go:255] eviction manager: failed to get summary stats: failed to get node info: node "ubuntu" not found
Jun 23 19:33:48 ubuntu kubelet[1110584]: E0623 19:33:48.616055 1110584 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:33:48 ubuntu kubelet[1110584]: E0623 19:33:48.684114 1110584 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused >
Jun 23 19:33:48 ubuntu kubelet[1110584]: F0623 19:33:48.684219 1110584 csi_plugin.go:285] Failed to initialize CSINodeInfo after retrying
Jun 23 19:33:48 ubuntu systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- An ExecStart= process belonging to unit kubelet.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 255.
Jun 23 19:33:48 ubuntu systemd[1]: kubelet.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
Jun 23 19:33:58 ubuntu systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 7.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Automatic restarting of the unit kubelet.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Jun 23 19:33:58 ubuntu systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: A stop job for unit kubelet.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit kubelet.service has finished.
-- 
-- The job identifier is 41893 and the job result is done.
Jun 23 19:33:58 ubuntu systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: A start job for unit kubelet.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit kubelet.service has finished successfully.
-- 
-- The job identifier is 41893.
Jun 23 19:33:58 ubuntu kubelet[1111058]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/doc>
Jun 23 19:33:58 ubuntu kubelet[1111058]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/>
Jun 23 19:33:58 ubuntu kubelet[1111058]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/doc>
Jun 23 19:33:58 ubuntu kubelet[1111058]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/>
Jun 23 19:33:58 ubuntu kubelet[1111058]: I0623 19:33:58.988122 1111058 server.go:417] Version: v1.18.0
Jun 23 19:33:58 ubuntu kubelet[1111058]: I0623 19:33:58.989211 1111058 plugins.go:100] No cloud provider specified.
Jun 23 19:33:58 ubuntu kubelet[1111058]: I0623 19:33:58.989334 1111058 server.go:837] Client rotation is on, will bootstrap in background
Jun 23 19:33:58 ubuntu kubelet[1111058]: I0623 19:33:58.996455 1111058 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.273719 1111058 machine.go:331] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.343830 1111058 server.go:646] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.346402 1111058 container_manager_linux.go:266] container manager verified user specified cgroup-root exists: []
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.346543 1111058 container_manager_linux.go:271] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: Kube>
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.350017 1111058 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.350094 1111058 container_manager_linux.go:301] [topologymanager] Initializing Topology Manager with none policy
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.350116 1111058 container_manager_linux.go:306] Creating device plugin manager: true
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.350484 1111058 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.350546 1111058 client.go:92] Start docker client with request timeout=2m0s
Jun 23 19:33:59 ubuntu kubelet[1111058]: W0623 19:33:59.386559 1111058 docker_service.go:561] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.386637 1111058 docker_service.go:238] Hairpin mode set to "hairpin-veth"
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.459763 1111058 docker_service.go:253] Docker cri networking managed by cni
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.508183 1111058 docker_service.go:258] Docker Info: &{ID:5IF2:AWZA:R3O7:7DM4:W4YO:YB2Z:7AM5:GG7E:YCWK:OMI3:A7GX:MTS2 Containers:10 ContainersRunning>
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.508578 1111058 docker_service.go:271] Setting cgroupDriver to cgroupfs
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.584316 1111058 remote_runtime.go:59] parsed scheme: ""
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.584451 1111058 remote_runtime.go:59] scheme "" not registered, fallback to default scheme
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.584667 1111058 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.584726 1111058 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585065 1111058 remote_image.go:50] parsed scheme: ""
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585127 1111058 remote_image.go:50] scheme "" not registered, fallback to default scheme
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585193 1111058 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585227 1111058 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585507 1111058 kubelet.go:292] Adding pod path: /etc/kubernetes/manifests
Jun 23 19:33:59 ubuntu kubelet[1111058]: I0623 19:33:59.585761 1111058 kubelet.go:317] Watching apiserver
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.597273 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods "" not found
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.598659 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: services "" not found
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.601505 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods "" not found
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.601717 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: services "" not found
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.603696 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:526: Failed to list *v1.Node: nodes "ubuntu" not found
Jun 23 19:33:59 ubuntu kubelet[1111058]: E0623 19:33:59.607457 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:526: Failed to list *v1.Node: nodes "ubuntu" not found
Jun 23 19:34:01 ubuntu kubelet[1111058]: E0623 19:34:01.413235 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:526: Failed to list *v1.Node: nodes "ubuntu" not found
Jun 23 19:34:01 ubuntu kubelet[1111058]: E0623 19:34:01.644814 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods "" not found
Jun 23 19:34:02 ubuntu kubelet[1111058]: E0623 19:34:02.110632 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: services "" not found
Jun 23 19:34:05 ubuntu kubelet[1111058]: E0623 19:34:05.916940 1111058 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Jun 23 19:34:05 ubuntu kubelet[1111058]:         For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.958278 1111058 kuberuntime_manager.go:211] Container runtime docker initialized, version: 20.10.2, apiVersion: 1.41.0
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.960283 1111058 server.go:1125] Started kubelet
Jun 23 19:34:05 ubuntu kubelet[1111058]: E0623 19:34:05.960489 1111058 kubelet.go:1305] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: >
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.961820 1111058 server.go:145] Starting to listen on 0.0.0.0:10250
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.963955 1111058 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.964484 1111058 server.go:393] Adding debug handlers to kubelet server.
Jun 23 19:34:05 ubuntu kubelet[1111058]: I0623 19:34:05.976342 1111058 volume_manager.go:265] Starting Kubelet Volume Manager
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:05.976676 1111058 desired_state_of_world_populator.go:139] Desired state populator starts to run
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.006810 1111058 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: csidrivers.storage.k8s.io "" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.013489 1111058 controller.go:228] failed to get node "ubuntu" when trying to set owner ref to the node lease: nodes "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.016365 1111058 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: csidrivers.storage.k8s.io "" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.020387 1111058 controller.go:228] failed to get node "ubuntu" when trying to set owner ref to the node lease: nodes "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.077136 1111058 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.077283 1111058 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.104423 1111058 clientconn.go:106] parsed scheme: "unix"
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.104573 1111058 clientconn.go:106] scheme "unix" not registered, fallback to default scheme
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.104943 1111058 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.105015 1111058 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.146699 1111058 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods "" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.177533 1111058 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.182233 1111058 status_manager.go:158] Starting to sync pod status with apiserver
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.182585 1111058 kubelet.go:1821] Starting kubelet main sync loop.
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.182886 1111058 kubelet.go:1845] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: ple>
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.196300 1111058 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: runtimeclasses.node.k8s.io "" not>
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.199414 1111058 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: runtimeclasses.node.k8s.io "" not>
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.279448 1111058 kubelet.go:2267] node "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.287428 1111058 kubelet.go:1845] skipping pod synchronization - container runtime status check may not have completed yet
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.301744 1111058 kubelet_node_status.go:70] Attempting to register node ubuntu
Jun 23 19:34:06 ubuntu kubelet[1111058]: I0623 19:34:06.361593 1111058 kubelet_node_status.go:73] Successfully registered node ubuntu
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.363454 1111058 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "ubuntu": nodes "ubuntu" not found
Jun 23 19:34:06 ubuntu kubelet[1111058]: E0623 19:34:06.365906 1111058 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "ubuntu": nodes "ubuntu" not found

@rambohe-ch
Copy link
Member

@Windrow Thank you for uploading detail logs.
would you check that yurt-hub pod have running or not?

@Windrow
Copy link
Contributor Author

Windrow commented Jun 23, 2021

@rambohe-ch

The pod is running, but has many errors.

E0623 14:44:43.160917       1 local.go:79] could not proxy local for kubelet get nodes: /api/v1/nodes/ubuntu, nodes "ubuntu" not found
I0623 14:44:43.161391       1 util.go:215] kubelet get nodes: /api/v1/nodes/ubuntu with status code 404, spent 721µs
I0623 14:44:43.258513       1 connrotation.go:145] create a connection from 192.168.0.82:59810 to 192.168.0.97:6443, total 1 connections in transport manager dialer
I0623 14:44:43.269964       1 connrotation.go:48] close connection from 192.168.0.82:59810 to 192.168.0.97:6443 for transport manager dialer, remain 0 connections
I0623 14:44:43.421098       1 util.go:232] start proxying: get /apis/storage.k8s.io/v1/csinodes/ubuntu, in flight requests: 1
E0623 14:44:43.421404       1 local.go:208] object not found for kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu
E0623 14:44:43.421484       1 local.go:79] could not proxy local for kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu, csinodes.storage.k8s.io "ubuntu" not found
I0623 14:44:43.421639       1 util.go:215] kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu with status code 404, spent 379.204µs
I0623 14:44:43.560535       1 util.go:232] start proxying: get /api/v1/nodes/ubuntu, in flight requests: 1
E0623 14:44:43.560885       1 local.go:208] object not found for kubelet get nodes: /api/v1/nodes/ubuntu
E0623 14:44:43.560977       1 local.go:79] could not proxy local for kubelet get nodes: /api/v1/nodes/ubuntu, nodes "ubuntu" not found
I0623 14:44:43.561434       1 util.go:215] kubelet get nodes: /api/v1/nodes/ubuntu with status code 404, spent 697.5µs
I0623 14:44:43.760656       1 util.go:232] start proxying: get /apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0, in flight requests: 1
E0623 14:44:43.761097       1 storage_wrapper.go:158] could not list objects for kubelet/csidrivers, specified key is not found
E0623 14:44:43.761181       1 local.go:208] object not found for kubelet list csidrivers: /apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0
E0623 14:44:43.761252       1 local.go:79] could not proxy local for kubelet list csidrivers: /apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0, csidrivers.storage.k8s.io "" not found
I0623 14:44:43.761419       1 util.go:215] kubelet list csidrivers: /apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0 with status code 404, spent 471.518µs
I0623 14:44:43.960566       1 util.go:232] start proxying: get /apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0, in flight requests: 1
E0623 14:44:43.961034       1 storage_wrapper.go:158] could not list objects for kubelet/runtimeclasses, specified key is not found
E0623 14:44:43.961132       1 local.go:208] object not found for kubelet list runtimeclasses: /apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0
E0623 14:44:43.961238       1 local.go:79] could not proxy local for kubelet list runtimeclasses: /apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0, runtimeclasses.node.k8s.io "" not found
I0623 14:44:43.961410       1 util.go:215] kubelet list runtimeclasses: /apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0 with status code 404, spent 524.277µs
I0623 14:44:44.072549       1 connrotation.go:145] create a connection from 192.168.0.82:59814 to 192.168.0.97:6443, total 1 connections in transport manager dialer
I0623 14:44:44.085755       1 connrotation.go:48] close connection from 192.168.0.82:59814 to 192.168.0.97:6443 for transport manager dialer, remain 0 connections
I0623 14:44:44.086183       1 health_checker.go:215] failed to update lease: backoff ensure lease error: Get https://192.168.0.97:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubuntu?timeout=2s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), remote server https://192.168.0.97:6443
I0623 14:44:45.625031       1 util.go:232] start proxying: get /api/v1/services?limit=500&resourceVersion=0, in flight requests: 1
E0623 14:44:45.625456       1 storage_wrapper.go:158] could not list objects for kubelet/services, specified key is not found
E0623 14:44:45.625522       1 local.go:208] object not found for kubelet list services: /api/v1/services?limit=500&resourceVersion=0
E0623 14:44:45.625562       1 local.go:79] could not proxy local for kubelet list services: /api/v1/services?limit=500&resourceVersion=0, services "" not found
I0623 14:44:45.625931       1 util.go:215] kubelet list services: /api/v1/services?limit=500&resourceVersion=0 with status code 404, spent 588.351µs
I0623 14:44:46.312702       1 util.go:232] start proxying: get /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0, in flight requests: 1
E0623 14:44:46.313178       1 storage_wrapper.go:158] could not list objects for kubelet/pods, specified key is not found
E0623 14:44:46.313249       1 local.go:208] object not found for kubelet list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0
E0623 14:44:46.313316       1 local.go:79] could not proxy local for kubelet list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0, pods "" not found
I0623 14:44:46.313771       1 util.go:215] kubelet list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0 with status code 404, spent 731.315µs
I0623 14:44:47.014608       1 util.go:232] start proxying: get /apis/storage.k8s.io/v1/csinodes/ubuntu, in flight requests: 1
E0623 14:44:47.014892       1 local.go:208] object not found for kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu
E0623 14:44:47.014993       1 local.go:79] could not proxy local for kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu, csinodes.storage.k8s.io "ubuntu" not found
I0623 14:44:47.015181       1 util.go:215] kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu with status code 404, spent 419.703µs
I0623 14:44:47.015702       1 util.go:232] start proxying: get /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubuntu?timeout=10s, in flight requests: 1
E0623 14:44:47.016105       1 local.go:208] object not found for kubelet get leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubuntu?timeout=10s
E0623 14:44:47.016185       1 local.go:79] could not proxy local for kubelet get leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubuntu?timeout=10s, leases.coordination.k8s.io "ubuntu" not found
I0623 14:44:47.016366       1 util.go:215] kubelet get leases: /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ubuntu?timeout=10s with status code 404, spent 416.352µs
I0623 14:44:47.016823       1 util.go:232] start proxying: get /api/v1/nodes/ubuntu, in flight requests: 1

@rambohe-ch

I did some further investigation today, I found that files in /var/lib/yurthub/ are not updated, they are still files created few weeks ago, thus we have certification problem.

I guess my proposal before to align common.go to template file was kind of a walk-around, to use /etc/kubernetes/pki instead of what we have in /var/lib/yurthub/. I'll continue checking when /var/lib/yurthub/ would be updated, which seems more likely to be the root cause.

@Windrow
Copy link
Contributor Author

Windrow commented Jun 24, 2021

func (ycm *yurtHubCertManager) initCaCert() error {
        caFile := ycm.getCaFile()
        ycm.caFile = caFile

        if exists, err := util.FileExists(caFile); exists {
                klog.Infof("%s file already exists, so skip to create ca file", caFile)
                return nil

@Windrow
Copy link
Contributor Author

Windrow commented Jun 24, 2021

Hi, @rambohe-ch

It is clear to me now. /var/lib/yurthub/ was not updated due to the existence of old files.

Two questions to discuss now.

  • Can we overwrite ca files when they already exist? For any concern that we keep the old ones?

  • Is there a way to reset an offline node locally separately? yurtctl revert command works only on online nodes. And even with yurtctl revert, /var/lib/yurthub/ is not cleared, which would cause problem when we use this device the next time.

Windrow pushed a commit to Windrow/openyurt that referenced this issue Jun 25, 2021
…xecuted issue. See detailed description of the issue at openyurtio#367.
@Windrow
Copy link
Contributor Author

Windrow commented Jun 25, 2021

I submitted a pull request of aligning common.go to template file. It covers this issue anyway.

What should we do upon /var/lib/yurthub/? Anyone has any idea?

@rambohe-ch
Copy link
Member

@Windrow I'm very sorry for later feedback.

Two questions to discuss now.

  • Can we overwrite ca files when they already exist? For any concern that we keep the old ones?

The files under /var/lib/yurthub are created by yurt-hub when convert the node at the first time.
and will be re-used by yurt-hub for avoiding duplicated creation and creation failure when cloud-edge network disconnected.
so it's not recommend to overwrite files under /var/lib/yurthub.

if old files under /var/lib/yurthub dir have effected the yurt-hub startup, please delete the dir /var/lib/yurthub manually at first.

  • Is there a way to reset an offline node locally separately? yurtctl revert command works only on online nodes. And even with yurtctl revert, /var/lib/yurthub/ is not cleared, which would cause problem when we use this device the next time.

Now you need to delete the /var/lib/yurthub manually to reset an offline node locally. and we will add local reset feature in yurtctl at soon. If you are interested at this feature, please participate in contribution.

@rambohe-ch
Copy link
Member

I submitted a pull request of aligning common.go to template file. It covers this issue anyway.

What should we do upon /var/lib/yurthub/? Anyone has any idea?

Now, we need to delete the /var/lib/yurthub manually.

by the way, maybe we need to check the validity of the old files under /var/lib/yurthub before re-use the old files.

@Windrow
Copy link
Contributor Author

Windrow commented Jun 28, 2021

for avoiding duplicated creation and creation failure when cloud-edge network disconnected.

Sorry, but the creation of what here?

by the way, maybe we need to check the validity of the old files under /var/lib/yurthub before re-use the old files.

Yes. Maybe a timeout or a retry limit will do?

@Windrow
Copy link
Contributor Author

Windrow commented Jun 29, 2021

@Windrow
Copy link
Contributor Author

Windrow commented Jun 30, 2021

I0630 14:51:11.958029       1 config.go:124] yurthub would connect remote servers: https://192.168.0.128:6443
I0630 14:51:11.962502       1 start.go:67] yurthub cfg: &config.YurtHubConfiguration{LBMode:"rr", RemoteServers:[]*url.URL{(*url.URL)(0x4000568100)}, YurtHubServerAddr:"127.0.0.1:10267", YurtHubProxyServerAddr:"127.0.0.1:10261", YurtHubProxyServerDummyAddr:"169.254.2.1:10261", GCFrequency:120, CertMgrMode:"hubself", NodeName:"ubuntu", HeartbeatFailedRetry:3, HeartbeatHealthyThreshold:2, HeartbeatTimeoutSeconds:2, MaxRequestInFlight:250, JoinToken:"ghijkl.0123456789101112", RootDir:"/var/lib/yurthub", EnableProfiling:true, EnableDummyIf:true, EnableIptables:true, HubAgentDummyIfName:"yurthub-dummy0", StorageWrapper:(*cachemanager.storageWrapper)(0x4000187dc0), SerializerManager:(*serializer.SerializerManager)(0x4000187e40)}
I0630 14:51:11.962655       1 start.go:82] 1. register cert managers
I0630 14:51:11.962697       1 certificate.go:60] Registered certificate manager kubelet
I0630 14:51:11.962747       1 certificate.go:60] Registered certificate manager hubself
I0630 14:51:11.962771       1 start.go:88] 2. create cert manager with hubself mode
I0630 14:51:11.962847       1 cert_mgr.go:239] /var/lib/yurthub/pki/ca.crt file already exists, so skip to create ca file
I0630 14:51:11.983053       1 cert_mgr.go:124] use /var/lib/yurthub/pki/ca.crt ca file to bootstrap yurthub
I0630 14:51:11.984461       1 cert_mgr.go:314] yurthub bootstrap conf file already exists, skip init bootstrap
I0630 14:51:11.993306       1 certificate_store.go:130] Loading cert/key pair from "/var/lib/yurthub/pki/yurthub-current.pem".
I0630 14:51:12.032592       1 certificate_manager.go:282] Certificate rotation is enabled.
I0630 14:51:12.032750       1 cert_mgr.go:438] yurthub config file already exists, skip init config file
I0630 14:51:12.032885       1 certificate_manager.go:553] Certificate expiration is 2022-06-30 10:34:35 +0000 UTC, rotation deadline is 2022-05-09 17:39:34.602331878 +0000 UTC
I0630 14:51:12.032984       1 certificate_manager.go:288] Waiting 7514h48m22.569356549s for next certificate rotation
I0630 14:51:12.035584       1 start.go:96] 3. new transport manager
I0630 14:51:12.035629       1 transport.go:60] use /var/lib/yurthub/pki/ca.crt ca cert file to access remote server
I0630 14:51:12.036040       1 start.go:104] 4. create health checker for remote servers 
I0630 14:51:12.037587       1 connrotation.go:145] create a connection from 192.168.0.127:46282 to 192.168.0.128:6443, total 1 connections in transport manager dialer
I0630 14:51:13.454485       1 health_checker.go:215] failed to update lease: backoff ensure lease error: Unauthorized, remote server https://192.168.0.128:6443
W0630 14:51:13.454548       1 health_checker.go:96] cluster remote server https://192.168.0.128:6443 is unhealthy.
I0630 14:51:13.454640       1 start.go:113] 5. new cache manager with storage wrapper and serializer manager
I0630 14:51:13.455023       1 cache_agent.go:68] reset cache agents to [kubelet kube-proxy flanneld coredns yurttunnel-agent]
I0630 14:51:13.464078       1 start.go:121] 6. new gc manager for node ubuntu, and gc frequency is a random time between 120 min and 360 min
I0630 14:51:13.464813       1 gc.go:97] list pod keys from storage, total: 5
I0630 14:51:13.466963       1 cert_mgr.go:213] re-fix hub rest config host successfully with server https://192.168.0.128:6443
E0630 14:51:13.494417       1 gc.go:113] could not list pods for node(ubuntu), Unauthorized
I0630 14:51:13.494622       1 start.go:130] 7. new reverse proxy handler for remote servers
I0630 14:51:13.494706       1 start.go:139] 8. create dummy network interface yurthub-dummy0 and init iptables manager
I0630 14:51:13.494806       1 gc.go:74] start gc events after waiting 248.241µs from previous gc
I0630 14:51:13.499718       1 cert_mgr.go:213] re-fix hub rest config host successfully with server https://192.168.0.128:6443
I0630 14:51:13.508918       1 gc.go:163] list kubelet event keys from storage, total: 42
E0630 14:51:13.512477       1 gc.go:177] could not get kubelet kubelet/events/default/ubuntu.168bce0200b9c401 event for node(ubuntu), Unauthorized
I0630 14:51:13.512580       1 gc.go:160] no kube-proxy events in local storage, skip kube-proxy events gc
I0630 14:51:13.589586       1 start.go:147] 9. new yurthub server and begin to serve, dummy proxy server: 169.254.2.1:10261
I0630 14:51:13.589659       1 start.go:150] 9. new yurthub server and begin to serve, proxy server: 127.0.0.1:10261, hub server: 127.0.0.1:10267
I0630 14:51:19.827438       1 util.go:232] start proxying: get /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0, in flight requests: 1
I0630 14:51:19.836093       1 util.go:232] start proxying: get /api/v1/services?limit=500&resourceVersion=0, in flight requests: 2
I0630 14:51:19.837175       1 util.go:232] start proxying: get /api/v1/nodes?fieldSelector=metadata.name%3Dubuntu&limit=500&resourceVersion=0, in flight requests: 3
I0630 14:51:19.847186       1 util.go:215] kubelet list nodes: /api/v1/nodes?fieldSelector=metadata.name%3Dubuntu&limit=500&resourceVersion=0 with status code 200, spent 9.727074ms
I0630 14:51:19.848767       1 util.go:232] start proxying: get /api/v1/nodes?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dubuntu&resourceVersion=1076673&timeoutSeconds=551&watch=true, in flight requests: 3
I0630 14:51:19.856468       1 util.go:215] kubelet list services: /api/v1/services?limit=500&resourceVersion=0 with status code 200, spent 20.065722ms
I0630 14:51:19.858253       1 util.go:232] start proxying: get /api/v1/services?allowWatchBookmarks=true&resourceVersion=192&timeoutSeconds=483&watch=true, in flight requests: 3
I0630 14:51:19.865159       1 util.go:215] kubelet list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Dubuntu&limit=500&resourceVersion=0 with status code 200, spent 37.322593ms
I0630 14:51:19.867585       1 util.go:232] start proxying: get /api/v1/pods?allowWatchBookmarks=true&fieldSelector=spec.nodeName%3Dubuntu&resourceVersion=1786&timeoutSeconds=415&watch=true, in flight requests: 3
I0630 14:51:24.866911       1 health_checker.go:215] failed to update lease: backoff ensure lease error: Unauthorized, remote server https://192.168.0.128:6443
I0630 14:51:26.157822       1 util.go:232] start proxying: get /apis/storage.k8s.io/v1/csinodes/ubuntu, in flight requests: 4
I0630 14:51:26.160928       1 util.go:215] kubelet get csinodes: /apis/storage.k8s.io/v1/csinodes/ubuntu with status code 200, spent 2.796241ms
I0630 14:51:26.161695       1 util.go:232] start proxying: post /api/v1/namespaces/default/events, in flight requests: 4
I0630 14:51:26.161997       1 util.go:215] kubelet create events: /api/v1/namespaces/default/events with status code 201, spent 144.314µs
I0630 14:51:26.173481       1 util.go:232] start proxying: get /apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0, in flight requests: 4

@Windrow
Copy link
Contributor Author

Windrow commented Jul 2, 2021

@rambohe-ch

Sorry that I forgot to ping you XD

#367 (comment)

"Duplicated creation" of what you mentioned there?

@rambohe-ch
Copy link
Member

for avoiding duplicated creation and creation failure when cloud-edge network disconnected.

Sorry, but the creation of what here?

@Windrow Very sorry for late feedback.
The meaning of duplicated creation is that overwrite files under /var/lib/yurthub dir.

by the way, maybe we need to check the validity of the old files under /var/lib/yurthub before re-use the old files.

Yes. Maybe a timeout or a retry limit will do?

I think that if server address in /var/lib/yurthub/yurthub.conf file is changed, we can overwrite files under /var/lib/yurthub dir because the node maybe connected to other kube-apiserver before and files under /var/lib/yurthub is not cleanuped.

@rambohe-ch
Copy link
Member

rambohe-ch commented Jul 9, 2021

I0630 14:51:11.958029       1 config.go:124] yurthub would connect remote servers: https://192.168.0.128:6443
I0630 14:51:11.962502       1 start.go:67] yurthub cfg: &config.YurtHubConfiguration{LBMode:"rr", RemoteServers:[]*url.URL{(*url.URL)(0x4000568100)}, YurtHubServerAddr:"127.0.0.1:10267", YurtHubProxyServerAddr:"127.0.0.1:10261", YurtHubProxyServerDummyAddr:"169.254.2.1:10261", GCFrequency:120, CertMgrMode:"hubself", NodeName:"ubuntu", HeartbeatFailedRetry:3, HeartbeatHealthyThreshold:2, HeartbeatTimeoutSeconds:2, MaxRequestInFlight:250, JoinToken:"ghijkl.0123456789101112", RootDir:"/var/lib/yurthub", EnableProfiling:true, EnableDummyIf:true, EnableIptables:true, HubAgentDummyIfName:"yurthub-dummy0", StorageWrapper:(*cachemanager.storageWrapper)(0x4000187dc0), SerializerManager:(*serializer.SerializerManager)(0x4000187e40)}
I0630 14:51:11.962655       1 start.go:82] 1. register cert managers
I0630 14:51:11.962697       1 certificate.go:60] Registered certificate manager kubelet
I0630 14:51:11.962747       1 certificate.go:60] Registered certificate manager hubself
I0630 14:51:11.962771       1 start.go:88] 2. create cert manager with hubself mode
I0630 14:51:11.962847       1 cert_mgr.go:239] /var/lib/yurthub/pki/ca.crt file already exists, so skip to create ca file

if /var/lib/yurthub/pki/ca.crt file is ready, we only re-use it instead get it from kube-apiserver again and overwrite it.
because we may not able to get it from kube-apiserver for cloud-edge network disconnected.

I0630 14:51:11.983053 1 cert_mgr.go:124] use /var/lib/yurthub/pki/ca.crt ca file to bootstrap yurthub
I0630 14:51:11.984461 1 cert_mgr.go:314] yurthub bootstrap conf file already exists, skip init bootstrap
I0630 14:51:11.993306 1 certificate_store.go:130] Loading cert/key pair from "/var/lib/yurthub/pki/yurthub-current.pem".
I0630 14:51:12.032592 1 certificate_manager.go:282] Certificate rotation is enabled.
I0630 14:51:12.032750 1 cert_mgr.go:438] yurthub config file already exists, skip init config file
I0630 14:51:12.032885 1 certificate_manager.go:553] Certificate expiration is 2022-06-30 10:34:35 +0000 UTC, rotation deadline is 2022-05-09 17:39:34.602331878 +0000 UTC
I0630 14:51:12.032984 1 certificate_manager.go:288] Waiting 7514h48m22.569356549s for next certificate rotation

@Windrow
Copy link
Contributor Author

Windrow commented Jul 9, 2021

@rambohe-ch No problem.

What is the expected user behavior to reset a yurt cluster? yurtctl revert, then kubeadm reset?

I was considering to remove /var/lib/yurthub when yurtctl revert. If users behave beyond expectation, then they should take responsibility, removing files manually by themselves.

I think that if server address in /var/lib/yurthub/yurthub.conf file is changed, we can overwrite files under /var/lib/yurthub dir

ca.crt is possible to be different when address stays the same. Reset the cluster and establish the cluster with master node unchanged, you will get a cluster of same server address but new public key.

@rambohe-ch
Copy link
Member

ca.crt is possible to be different when address stays the same. Reset the cluster and establish the cluster with master node unchanged, you will get a cluster of same server address but new public key.

we will add yurtctl reset command to cleanup edge node at soon. you can reference the proposal here: #341

openyurt-bot pushed a commit that referenced this issue Aug 12, 2021
* Bugfix: kubelet on edge node keeps restarting after yurtctl convert executed issue. See detailed description of the issue at #367.

* Revert ec98fef

* Remove yurt-hub config directory and certificates in it when revert edgenode.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix gofmt warning.

Co-authored-by: Yinzhe.Wu <Yinzhe.Wu@sony.com>
MrGirl pushed a commit to MrGirl/openyurt that referenced this issue Mar 29, 2022
* Bugfix: kubelet on edge node keeps restarting after yurtctl convert executed issue. See detailed description of the issue at openyurtio#367.

* Revert ec98fef

* Remove yurt-hub config directory and certificates in it when revert edgenode.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix compilation issue.

* Fix gofmt warning.

Co-authored-by: Yinzhe.Wu <Yinzhe.Wu@sony.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants