Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s: All pods crashing with latest version #181790

Closed
collinarnett opened this issue Jul 17, 2022 · 16 comments
Closed

k3s: All pods crashing with latest version #181790

collinarnett opened this issue Jul 17, 2022 · 16 comments
Assignees
Labels
0.kind: bug Something is broken 6.topic: k3s Kubernates distribution (https://k3s.io/)

Comments

@collinarnett
Copy link

Describe the bug

This is going to be a long description since I'm not entirely sure where the bounds of k3s are when it comes to statefulness. I'm currently running the latest version of k3s in nixpkgs and I am unable to stand up the cluster without all pods failing after helm deploys traefik. I have search upstream's issues and there doesn't seem to be anything relevant there. I have spent quite a lot of time scouring the logs of both the pods and k3s itself. I also tried reverting the package to a previous commit and that did not work either.

I can't see anything obvious in the k3s logs but there are quite a few warnings and errors but I'm not sure which are genuine errors or just the kubelet complaining about the state not being ready. The pods don't seem to indicate anything explicit either.

Steps To Reproduce

Here is the current config I have:

{ lib, pkgs, ... }:
let
  # https://github.com/NixOS/nixpkgs/pull/176520
  k3s = pkgs.k3s.overrideAttrs
    (old: rec { buildInputs = old.buildInputs ++ [ pkgs.ipset ]; });
in {
  networking.firewall.allowedTCPPorts = [ 6443 80 443 10250 ];
  networking.firewall.allowedUDPPorts = [ 8472 ];
  services.k3s = {
    enable = true;
    role = "server";
    package = k3s;
  };
  environment.systemPackages = [
    (pkgs.writeShellScriptBin "k3s-reset-node"
      (builtins.readFile ./k3s-reset-node))
  ];
}

Expected behavior

Pods come up in the kube-system namespace.

Screenshots

journalctl
k3s_logs.txt

pods

$ k get pods -A                                                                                                   
[sudo] password for collin: 
NAMESPACE     NAME                                      READY   STATUS             RESTARTS         AGE
kube-system   helm-install-traefik-crd-ntl24            0/1     Completed          0                124m
kube-system   helm-install-traefik-xg6tn                0/1     Completed          1                124m
kube-system   traefik-7cd4fcff68-8s2tg                  0/1     CrashLoopBackOff   28 (3m38s ago)   124m
kube-system   local-path-provisioner-7b7dc8d6f5-rqd5j   0/1     CrashLoopBackOff   23 (2m46s ago)   124m
kube-system   coredns-b96499967-4vhc8                   0/1     CrashLoopBackOff   26 (116s ago)    124m
kube-system   svclb-traefik-1628ae6b-vc7j8              0/2     CrashLoopBackOff   50 (41s ago)     124m
kube-system   metrics-server-668d979685-m5mwm           1/1     Running            34 (5m54s ago)   124m
$ k logs metrics-server-668d979685-m5mwm -p -n kube-system
I0717 00:24:27.948191       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0717 00:24:27.948198       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948240       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0717 00:24:27.948243       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948204       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948258       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948381       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0717 00:24:27.948519       1 secure_serving.go:202] Serving securely on [::]:4443
I0717 00:24:27.948563       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0717 00:24:28.049036       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0717 00:24:28.049050       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0717 00:24:28.049088       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0717 00:24:28.170843       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.171668       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.362833       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:31.363846       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:33.363524       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:35.363869       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:37.362718       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:39.363017       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:41.362617       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:26:05.487063       1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController
I0717 00:26:05.487084       1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:26:05.487091       1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:26:05.487177       1 tlsconfig.go:255] Shutting down DynamicServingCertificateController
I0717 00:26:05.487229       1 secure_serving.go:246] Stopped listening on [::]:4443
I0717 00:26:05.487237       1 dynamic_serving_content.go:145] Shutting down serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
$ k logs local-path-provisioner-7b7dc8d6f5-rqd5j -p -n kube-system                               
I0717 00:26:57.595741       1 controller.go:773] Starting provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
I0717 00:26:57.696339       1 controller.go:822] Started provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
time="2022-07-17T00:29:50Z" level=info msg="Receive terminated to exit" 
time="2022-07-17T00:29:50Z" level=info msg="stop watching config file" 
$ k logs coredns-b96499967-4vhc8 -p -n kube-system
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = b941b080e5322f6519009bb49349462c7ddb6317425b0f6a83e5451175b720703949e3f3b454a24e77f3ffe57fd5e9c6130e528a5a1dd00d9000e4afd6c1108d
CoreDNS-1.9.1
linux/amd64, go1.17.8, 4b597f8
[INFO] SIGTERM: Shutting down servers then terminating
$ k logs svclb-traefik-1628ae6b-vc7j8 -p -n kube-system                                  
Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ echo 10.43.12.129
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '!=' 1 ]
+ iptables -t nat -I PREROUTING '!' -s 10.43.12.129/32 -p TCP --dport 80 -j DNAT --to 10.43.12.129:80
+ iptables -t nat -I POSTROUTING -d 10.43.12.129/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause
/usr/bin/entry: line 27: can't open /pause: Interrupted system call
+ 
+ exit
$ k logs traefik-7cd4fcff68-8s2tg -p -n kube-system
time="2022-07-17T02:34:13Z" level=info msg="Configuration loaded from flags."
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9000: use of closed network connection" entryPointName=traefik
$ k describe node zombie                                   
Name:               zombie
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    egress.k3s.io/cluster=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=zombie
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"ca:10:b7:59:64:03"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.164
                    k3s.io/hostname: zombie
                    k3s.io/internal-ip: 192.168.1.164
                    k3s.io/node-args: ["server","--kubelet-arg","cgroup-driver=systemd"]
                    k3s.io/node-config-hash: 6FBPL7ZNB3BFG32NEVA44JMMFWZ34N6AY4YKS3OHY45APNNIJBJQ====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/a2527f9db03e21bee3a56f440fb6ea3cd6e7796abf2bb0c3428db9f447b522e5"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 16 Jul 2022 15:57:20 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  zombie
  AcquireTime:     <unset>
  RenewTime:       Sat, 16 Jul 2022 22:35:52 -0400
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:31 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.164
  Hostname:    zombie
Capacity:
  cpu:                24
  ephemeral-storage:  479081160Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65780800Ki
  pods:               110
Allocatable:
  cpu:                24
  ephemeral-storage:  466050152083
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65780800Ki
  pods:               110
System Info:
  Machine ID:                 440d6715398c4a968f8638b885267bf9
  System UUID:                b9c28570-b664-0000-0000-000000000000
  Boot ID:                    9a01a61d-e374-446d-8595-5fc100db7d98
  Kernel Version:             5.15.53
  OS Image:                   NixOS 22.11 (Raccoon)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.5.13-k3s1
  Kubelet Version:            v1.24.2+k3s2
  Kube-Proxy Version:         v1.24.2+k3s2
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
ProviderID:                   k3s://zombie
Non-terminated Pods:          (5 in total)
  Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                 metrics-server-668d979685-m5mwm            100m (0%)     0 (0%)      70Mi (0%)        0 (0%)         6h38m
  kube-system                 svclb-traefik-1628ae6b-vc7j8               0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 local-path-provisioner-7b7dc8d6f5-rqd5j    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 traefik-7cd4fcff68-8s2tg                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 coredns-b96499967-4vhc8                    100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     6h38m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                200m (0%)   0 (0%)
  memory             140Mi (0%)  170Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

Additional context

If you want to look at my entire config, it's hosted here https://github.com/collinarnett/brew

Notify maintainers

@euank
@superherointj
@Mic92
@kalbasit

Metadata

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.15.53, NixOS, 22.11 (Raccoon)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.9.1`
 - channels(collin): `""`
 - channels(root): `"nixos-21.11.335130.386234e2a61"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@collinarnett collinarnett added the 0.kind: bug Something is broken label Jul 17, 2022
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/kubernetes-bringup-on-current-unstable-2022-07-17/20403/2

@superherointj
Copy link
Contributor

superherointj commented Jul 18, 2022

Hi @collinarnett

I have reproduced a problem with an empty cluster:

image

The testing method needs to be improved/fixed as well. Tests should have failed.

Have you been able to solve problem (with Discourse information)?

@collinarnett
Copy link
Author

I have not been able to test it yet. I will get a chance to test this evening 😄

@bryanasdev000
Copy link
Member

bryanasdev000 commented Jul 18, 2022

Seems that a CNI is missing, if there is no CNI pods will crash and burn.

EDIT: Seems that k3s run flannel as process not as a pod.

@bryanasdev000
Copy link
Member

bryanasdev000 commented Jul 18, 2022

ip a to check for flannel interfaces and journalctl -u k3s may help answer the question of why is broken.

EDIT: Will try to reproduce in my system to help.

@bryanasdev000
Copy link
Member

bryanasdev000 commented Jul 19, 2022

@collinarnett @superherointj

  • Can you access kubernetes service?
  • Can you get a describe from restarting pods? (exec kubectl describe pod $POD -n $NAMESPACE)
  • Can you check that you have flannel interface? (exec ip a | grep flannel)
  • Can you check kubernetes events? In special for SandboxChanged. (exec kubectl get events | grep SandboxChanged)
  • Can you check if other pods can run ok? (exec kubectl create deployment nginx --image=nginx --replicas=10 --port=80)
  • Can you check iptables version? (exec iptables --version)

To try to access kubernetes service you can use the following command:

kubectl run -it --rm=True --restart=Never -n default --image alpine tmpshell -- sh

Once inside, do a apk add curl and curl -kv https://kubernetes or curl -kv https://10.43.0.1.

In my test, pods fail to access K8S API Server, which is a little bit strange, so maybe I am missing something (just copied the K3S NixOS Wiki entry).

In the case of OP, what is strange to me is SIGTERM at coredns and local path provisioner.

@bryanasdev000
Copy link
Member

Maybe related #179741

@euank
Copy link
Member

euank commented Jul 19, 2022

Thanks for the detailed report, @collinarnett!

I think I was able to reproduce and work around / fix this on my machine.

For me, the problem indeed showed up as "SandboxChanged":

$ sudo k3s kubectl describe po -n kube-system -l k8s-app=kube-dns

....
SandboxChanged  4m31s (x3 over 4m50s)   kubelet            Pod sandbox changed, it will be killed and re-created.

# with `services.k3s.extraFlags = "--kubelet-arg v=4";` set
$ sudo journalctl -u k3s.service

kuberuntime_manager.go:488] "No ready sandbox for pod can be found. Need to start a new one" pod="kube-system/coredns-b96499967-tb2cd"

The logs from the kubelet ended up not being all that useful, however, as soon as I ran sudo k3s server to try and repro it, I noticed that omitting the --kubelet-arg=cgroup-driver=systemd flag resolves the issue, which pointed me in the right direction.

The k3s nixos module right now adds on --kubelet-arg=cgroup-driver=systemd:

++ (optional (config.systemd.enableUnifiedCgroupHierarchy) "--kubelet-arg=cgroup-driver=systemd")

However, containerd for me didn't think it was using the systemd driver, i.e. doing stuff like sudo k3s crictl info | grep -i systemd showed systemdCgroup: false.

Also /var/lib/rancher/k3s/agent/etc/containerd/config.toml had SystemdCgroup = false set.

This was already reported on the upstream k3s repo here: k3s-io/k3s#5454

Removing the --kubelet-arg=cgroup-driver=systemd bit from the module switched it back to cgroupfs, and that worked.

The issue also suggested another workaround, which also worked (collapsed since it's verbose):

Workaround
  1. Start k3s
  2. Write the following to /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
[plugins.opt]
  path = "{{ .NodeConfig.Containerd.Opt }}"
[plugins.cri]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = {{ .NodeConfig.SELinux }}
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
{{- if .DisableCgroup}}
  disable_cgroup = true
{{end}}
{{- if .IsRunningInUserNS }}
  disable_apparmor = true
  restrict_oom_score_adj = true
{{end}}
{{- if .NodeConfig.AgentConfig.PauseImage }}
  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
{{end}}
{{- if .NodeConfig.AgentConfig.Snapshotter }}
[plugins.cri.containerd]
  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins.stargz]
cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
[plugins.stargz.cri_keychain]
enable_keychain = true
{{end}}
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.stargz.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.stargz.registry.mirrors."{{$k}}"]
  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{if $v.Rewrites}}
  [plugins.stargz.registry.mirrors."{{$k}}".rewrite]
{{range $pattern, $replace := $v.Rewrites}}
    "{{$pattern}}" = "{{$replace}}"
{{end}}
{{end}}
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.stargz.registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.stargz.registry.configs."{{$k}}".tls]
  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
{{end}}
{{end}}
{{end}}
{{end}}
{{end}}
{{- if not .NodeConfig.NoFlannel }}
[plugins.cri.cni]
  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}
[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.runc.options]
	SystemdCgroup = true
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors."{{$k}}"]
  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{if $v.Rewrites}}
  [plugins.cri.registry.mirrors."{{$k}}".rewrite]
{{range $pattern, $replace := $v.Rewrites}}
    "{{$pattern}}" = "{{$replace}}"
{{end}}
{{end}}
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.cri.registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.cri.registry.configs."{{$k}}".tls]
  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
{{end}}
{{end}}
{{end}}
{{range $k, $v := .ExtraRuntimes}}
[plugins.cri.containerd.runtimes."{{$k}}"]
  runtime_type = "{{$v.RuntimeType}}"
[plugins.cri.containerd.runtimes."{{$k}}".options]
  BinaryName = "{{$v.BinaryName}}"
{{end}}

(Note, this is just this, but with SystemdCgroup = {{ .SystemdCgroup }} replaced with SystemdCgroup = true)

  1. sudo systemctl restart k3s
  2. Done

So, where does that leave us?

Well, according to the upstream issue, systemd detection should be fixed soon upstream such that it soon defaults to the right containerd cgroup driver.. which presumably will cause the kubelet to do the right thing too. That "soon" though is next release (maybe a month? I don't know the exact release schedule), not the just-released 1.24.3.

In the meanwhile, I think the easiest resolution for us is probably to drop --kubelet-arg=cgroup-driver=systemd from the module, and wait on upstream to default to that for us.

The original motivation for cgroup-driver=systemd was because with docker, it was easy to mismatch drivers, and that would lead to k3s failing entirely. Now, even without that, k3s starts up and appears to function.

I'll put up a PR with that change, and hopefully that fixes things here! I'm optimistic since I do think I'm observing the same issue in my repro, and this does seem to fix it for me

euank added a commit to euank/nixpkgs that referenced this issue Jul 19, 2022
Setting `cgroup-driver=systemd` was originally necessary to match with
docker, else the kubelet would not start (NixOS#111835)

However, since then, docker support has been dropped from k3s (NixOS#177790).
As such, this option is much less necessary.

More importantly, it now seems to be actively causing issues. Due to an
upstream k3s bug, it's resulting in the kubelet and containerd having
different cgroup drivers, which seems to result in some difficult to
debug failure modes.

See
NixOS#181790 (comment)
for a description of this problem.

Removing this flag entirely seems reasonable to me, and it results in
k3s working again on my machine.
@superherointj
Copy link
Contributor

@collinarnett Can you test from master now?

@superherointj
Copy link
Contributor

superherointj commented Jul 19, 2022

@euank

First, thanks for coming to save the day. Your expertise is always appreciated. :-)

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

@collinarnett
Copy link
Author

@collinarnett Can you test from master now?

$ k get deployments -n kube-system
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
local-path-provisioner   1/1     1            1           2m8s
coredns                  1/1     1            1           2m8s
metrics-server           1/1     1            1           2m7s
traefik                  1/1     1            1           105s

Thank you all for your help 😄 At some point I hope I can help with these issues since I think Kubernetes on nixos is the way to go. I might also get a discussion about a matrix channel started on discourse since there seems to be small cluster of kube-nix people lurking around.

If there is no more discussion to be had on this issue feel free to close it.

@bryanasdev000
Copy link
Member

@euank

First, thanks for coming to save the day. Your expertise is always appreciated. :-)

Would you consider some improvement to the k3s testing to avoid this from happening again? Somehow we should be checking for this situation of the cluster not working properly.

If we have network in NixOS tests, maybe using some greps or jq to check pods status, deploy some NGINX pods and check connectivity between pods/services.

@superherointj superherointj changed the title All pods crashing with latest k3s k3s: All pods crashing with latest k3s Jul 19, 2022
@superherointj superherointj changed the title k3s: All pods crashing with latest k3s k3s: All pods crashing with latest version Jul 19, 2022
@euank
Copy link
Member

euank commented Jul 20, 2022

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

Yup, absolutely, we should have tests that would catch this.

I keep meaning to write a reasonable multi-node test, but keep not finding the time... If anyone wants to write one, I'd be happy to review, and if not, I'll probably eventually get to it 😅

For this one, I think any test that ran "long enough" would work, since I think the single-node test we have didn't catch it just because it ran so quickly, before the kubelet got the chance to poll and decide to recreate the pod.

@bryanasdev000
Copy link
Member

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

Yup, absolutely, we should have tests that would catch this.

I keep meaning to write a reasonable multi-node test, but keep not finding the time... If anyone wants to write one, I'd be happy to review, and if not, I'll probably eventually get to it sweat_smile

For this one, I think any test that ran "long enough" would work, since I think the single-node test we have didn't catch it just because it ran so quickly, before the kubelet got the chance to poll and decide to recreate the pod.

We could use the best solution to any race condition, sleep!

Seriously now, we can:

  • Setup and start k3s
  • Wait X minutes and poll base components status
  • Deploy some pods to check if we can schedule and check its stats too
  • Check DNS/Network trying to speak to these pods and maybe API Server too

@bryanasdev000
Copy link
Member

Now talking about hacks, how "ugly" is to have Network access in NixOS tests?

@euank
Copy link
Member

euank commented Jul 22, 2022

poll base components status

That ends up being a bit of a challenge, since running base components naively requires networking to download all those images.

The existing test right now just skips all those components for that reason:

services.k3s.extraFlags = "--no-deploy coredns,servicelb,traefik,local-storage,metrics-server --pause-image test.local/pause:local";

There is a potential solution there though: k3s upstream has an "airgapped images" tarball, and I think we could fetch that into the nix store, which we can then access in the test VM and then ctr image import all the images.

Anyway, I did get a multi-node test written (over here #182445), but I don't have much confidence it actually would have definitely caught this issue, though it should get flannel issues, and anything that would crashloop earlier / quicker.

@superherointj superherointj added the 6.topic: k3s Kubernates distribution (https://k3s.io/) label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: k3s Kubernates distribution (https://k3s.io/)
Projects
None yet
Development

No branches or pull requests

5 participants