k3s: All pods crashing with latest version #181790

collinarnett · 2022-07-17T02:56:50Z

Describe the bug

This is going to be a long description since I'm not entirely sure where the bounds of k3s are when it comes to statefulness. I'm currently running the latest version of k3s in nixpkgs and I am unable to stand up the cluster without all pods failing after helm deploys traefik. I have search upstream's issues and there doesn't seem to be anything relevant there. I have spent quite a lot of time scouring the logs of both the pods and k3s itself. I also tried reverting the package to a previous commit and that did not work either.

I can't see anything obvious in the k3s logs but there are quite a few warnings and errors but I'm not sure which are genuine errors or just the kubelet complaining about the state not being ready. The pods don't seem to indicate anything explicit either.

Steps To Reproduce

Here is the current config I have:

{ lib, pkgs, ... }:
let
  # https://github.com/NixOS/nixpkgs/pull/176520
  k3s = pkgs.k3s.overrideAttrs
    (old: rec { buildInputs = old.buildInputs ++ [ pkgs.ipset ]; });
in {
  networking.firewall.allowedTCPPorts = [ 6443 80 443 10250 ];
  networking.firewall.allowedUDPPorts = [ 8472 ];
  services.k3s = {
    enable = true;
    role = "server";
    package = k3s;
  };
  environment.systemPackages = [
    (pkgs.writeShellScriptBin "k3s-reset-node"
      (builtins.readFile ./k3s-reset-node))
  ];
}

Expected behavior

Pods come up in the kube-system namespace.

Screenshots

journalctl
k3s_logs.txt

pods

$ k get pods -A                                                                                                   
[sudo] password for collin: 
NAMESPACE     NAME                                      READY   STATUS             RESTARTS         AGE
kube-system   helm-install-traefik-crd-ntl24            0/1     Completed          0                124m
kube-system   helm-install-traefik-xg6tn                0/1     Completed          1                124m
kube-system   traefik-7cd4fcff68-8s2tg                  0/1     CrashLoopBackOff   28 (3m38s ago)   124m
kube-system   local-path-provisioner-7b7dc8d6f5-rqd5j   0/1     CrashLoopBackOff   23 (2m46s ago)   124m
kube-system   coredns-b96499967-4vhc8                   0/1     CrashLoopBackOff   26 (116s ago)    124m
kube-system   svclb-traefik-1628ae6b-vc7j8              0/2     CrashLoopBackOff   50 (41s ago)     124m
kube-system   metrics-server-668d979685-m5mwm           1/1     Running            34 (5m54s ago)   124m

$ k logs metrics-server-668d979685-m5mwm -p -n kube-system
I0717 00:24:27.948191       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0717 00:24:27.948198       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948240       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0717 00:24:27.948243       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:24:27.948204       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948258       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:24:27.948381       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0717 00:24:27.948519       1 secure_serving.go:202] Serving securely on [::]:4443
I0717 00:24:27.948563       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0717 00:24:28.049036       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0717 00:24:28.049050       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0717 00:24:28.049088       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0717 00:24:28.170843       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.171668       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:29.362833       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:31.363846       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:33.363524       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:35.363869       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:37.362718       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:39.363017       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:24:41.362617       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0717 00:26:05.487063       1 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController
I0717 00:26:05.487084       1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0717 00:26:05.487091       1 configmap_cafile_content.go:223] Shutting down client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0717 00:26:05.487177       1 tlsconfig.go:255] Shutting down DynamicServingCertificateController
I0717 00:26:05.487229       1 secure_serving.go:246] Stopped listening on [::]:4443
I0717 00:26:05.487237       1 dynamic_serving_content.go:145] Shutting down serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key

$ k logs local-path-provisioner-7b7dc8d6f5-rqd5j -p -n kube-system                               
I0717 00:26:57.595741       1 controller.go:773] Starting provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
I0717 00:26:57.696339       1 controller.go:822] Started provisioner controller rancher.io/local-path_local-path-provisioner-7b7dc8d6f5-rqd5j_0677ce3a-05da-446b-9492-3a9eb140c921!
time="2022-07-17T00:29:50Z" level=info msg="Receive terminated to exit" 
time="2022-07-17T00:29:50Z" level=info msg="stop watching config file"

$ k logs coredns-b96499967-4vhc8 -p -n kube-system
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = b941b080e5322f6519009bb49349462c7ddb6317425b0f6a83e5451175b720703949e3f3b454a24e77f3ffe57fd5e9c6130e528a5a1dd00d9000e4afd6c1108d
CoreDNS-1.9.1
linux/amd64, go1.17.8, 4b597f8
[INFO] SIGTERM: Shutting down servers then terminating

$ k logs svclb-traefik-1628ae6b-vc7j8 -p -n kube-system                                  
Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ echo 10.43.12.129
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '!=' 1 ]
+ iptables -t nat -I PREROUTING '!' -s 10.43.12.129/32 -p TCP --dport 80 -j DNAT --to 10.43.12.129:80
+ iptables -t nat -I POSTROUTING -d 10.43.12.129/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause
/usr/bin/entry: line 27: can't open /pause: Interrupted system call
+ 
+ exit

$ k logs traefik-7cd4fcff68-8s2tg -p -n kube-system
time="2022-07-17T02:34:13Z" level=info msg="Configuration loaded from flags."
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="accept tcp [::]:9000: use of closed network connection" entryPointName=traefik
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8443: use of closed network connection" entryPointName=websecure
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:8000: use of closed network connection" entryPointName=web
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9100: use of closed network connection" entryPointName=metrics
time="2022-07-17T02:34:14Z" level=error msg="close tcp [::]:9000: use of closed network connection" entryPointName=traefik

$ k describe node zombie                                   
Name:               zombie
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    egress.k3s.io/cluster=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=zombie
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"ca:10:b7:59:64:03"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.164
                    k3s.io/hostname: zombie
                    k3s.io/internal-ip: 192.168.1.164
                    k3s.io/node-args: ["server","--kubelet-arg","cgroup-driver=systemd"]
                    k3s.io/node-config-hash: 6FBPL7ZNB3BFG32NEVA44JMMFWZ34N6AY4YKS3OHY45APNNIJBJQ====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/a2527f9db03e21bee3a56f440fb6ea3cd6e7796abf2bb0c3428db9f447b522e5"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 16 Jul 2022 15:57:20 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  zombie
  AcquireTime:     <unset>
  RenewTime:       Sat, 16 Jul 2022 22:35:52 -0400
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:20 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 16 Jul 2022 22:33:16 -0400   Sat, 16 Jul 2022 15:57:31 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.164
  Hostname:    zombie
Capacity:
  cpu:                24
  ephemeral-storage:  479081160Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65780800Ki
  pods:               110
Allocatable:
  cpu:                24
  ephemeral-storage:  466050152083
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65780800Ki
  pods:               110
System Info:
  Machine ID:                 440d6715398c4a968f8638b885267bf9
  System UUID:                b9c28570-b664-0000-0000-000000000000
  Boot ID:                    9a01a61d-e374-446d-8595-5fc100db7d98
  Kernel Version:             5.15.53
  OS Image:                   NixOS 22.11 (Raccoon)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.5.13-k3s1
  Kubelet Version:            v1.24.2+k3s2
  Kube-Proxy Version:         v1.24.2+k3s2
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
ProviderID:                   k3s://zombie
Non-terminated Pods:          (5 in total)
  Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                 metrics-server-668d979685-m5mwm            100m (0%)     0 (0%)      70Mi (0%)        0 (0%)         6h38m
  kube-system                 svclb-traefik-1628ae6b-vc7j8               0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 local-path-provisioner-7b7dc8d6f5-rqd5j    0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 traefik-7cd4fcff68-8s2tg                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h38m
  kube-system                 coredns-b96499967-4vhc8                    100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     6h38m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                200m (0%)   0 (0%)
  memory             140Mi (0%)  170Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

Additional context

If you want to look at my entire config, it's hosted here https://github.com/collinarnett/brew

Notify maintainers

@euank
@superherointj
@Mic92
@kalbasit

Metadata

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.15.53, NixOS, 22.11 (Raccoon)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.9.1`
 - channels(collin): `""`
 - channels(root): `"nixos-21.11.335130.386234e2a61"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

nixos-discourse · 2022-07-18T14:42:20Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/kubernetes-bringup-on-current-unstable-2022-07-17/20403/2

superherointj · 2022-07-18T15:09:07Z

Hi @collinarnett

I have reproduced a problem with an empty cluster:

The testing method needs to be improved/fixed as well. Tests should have failed.

Have you been able to solve problem (with Discourse information)?

collinarnett · 2022-07-18T15:14:46Z

I have not been able to test it yet. I will get a chance to test this evening 😄

bryanasdev000 · 2022-07-18T16:12:06Z

Seems that a CNI is missing, if there is no CNI pods will crash and burn.

EDIT: Seems that k3s run flannel as process not as a pod.

bryanasdev000 · 2022-07-18T16:20:49Z

ip a to check for flannel interfaces and journalctl -u k3s may help answer the question of why is broken.

EDIT: Will try to reproduce in my system to help.

bryanasdev000 · 2022-07-19T08:47:12Z

@collinarnett @superherointj

Can you access kubernetes service?
Can you get a describe from restarting pods? (exec kubectl describe pod $POD -n $NAMESPACE)
Can you check that you have flannel interface? (exec ip a | grep flannel)
Can you check kubernetes events? In special for SandboxChanged. (exec kubectl get events | grep SandboxChanged)
Can you check if other pods can run ok? (exec kubectl create deployment nginx --image=nginx --replicas=10 --port=80)
Can you check iptables version? (exec iptables --version)

To try to access kubernetes service you can use the following command:

kubectl run -it --rm=True --restart=Never -n default --image alpine tmpshell -- sh

Once inside, do a apk add curl and curl -kv https://kubernetes or curl -kv https://10.43.0.1.

In my test, pods fail to access K8S API Server, which is a little bit strange, so maybe I am missing something (just copied the K3S NixOS Wiki entry).

In the case of OP, what is strange to me is SIGTERM at coredns and local path provisioner.

bryanasdev000 · 2022-07-19T09:03:49Z

Maybe related #179741

euank · 2022-07-19T09:47:01Z

Thanks for the detailed report, @collinarnett!

I think I was able to reproduce and work around / fix this on my machine.

For me, the problem indeed showed up as "SandboxChanged":

$ sudo k3s kubectl describe po -n kube-system -l k8s-app=kube-dns

....
SandboxChanged  4m31s (x3 over 4m50s)   kubelet            Pod sandbox changed, it will be killed and re-created.

# with `services.k3s.extraFlags = "--kubelet-arg v=4";` set
$ sudo journalctl -u k3s.service

kuberuntime_manager.go:488] "No ready sandbox for pod can be found. Need to start a new one" pod="kube-system/coredns-b96499967-tb2cd"

The logs from the kubelet ended up not being all that useful, however, as soon as I ran sudo k3s server to try and repro it, I noticed that omitting the --kubelet-arg=cgroup-driver=systemd flag resolves the issue, which pointed me in the right direction.

The k3s nixos module right now adds on --kubelet-arg=cgroup-driver=systemd:

nixpkgs/nixos/modules/services/cluster/k3s/default.nix

Line 114 in d2db107

    
           ++ (optional (config.systemd.enableUnifiedCgroupHierarchy) "--kubelet-arg=cgroup-driver=systemd")

However, containerd for me didn't think it was using the systemd driver, i.e. doing stuff like sudo k3s crictl info | grep -i systemd showed systemdCgroup: false.

Also /var/lib/rancher/k3s/agent/etc/containerd/config.toml had SystemdCgroup = false set.

This was already reported on the upstream k3s repo here: k3s-io/k3s#5454

Removing the --kubelet-arg=cgroup-driver=systemd bit from the module switched it back to cgroupfs, and that worked.

The issue also suggested another workaround, which also worked (collapsed since it's verbose):

Workaround

Start k3s
Write the following to /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

[plugins.opt]
  path = "{{ .NodeConfig.Containerd.Opt }}"
[plugins.cri]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = {{ .NodeConfig.SELinux }}
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
{{- if .DisableCgroup}}
  disable_cgroup = true
{{end}}
{{- if .IsRunningInUserNS }}
  disable_apparmor = true
  restrict_oom_score_adj = true
{{end}}
{{- if .NodeConfig.AgentConfig.PauseImage }}
  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
{{end}}
{{- if .NodeConfig.AgentConfig.Snapshotter }}
[plugins.cri.containerd]
  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins.stargz]
cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
[plugins.stargz.cri_keychain]
enable_keychain = true
{{end}}
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.stargz.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.stargz.registry.mirrors."{{$k}}"]
  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{if $v.Rewrites}}
  [plugins.stargz.registry.mirrors."{{$k}}".rewrite]
{{range $pattern, $replace := $v.Rewrites}}
    "{{$pattern}}" = "{{$replace}}"
{{end}}
{{end}}
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.stargz.registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.stargz.registry.configs."{{$k}}".tls]
  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
{{end}}
{{end}}
{{end}}
{{end}}
{{end}}
{{- if not .NodeConfig.NoFlannel }}
[plugins.cri.cni]
  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}
[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes.runc.options]
	SystemdCgroup = true
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors."{{$k}}"]
  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{if $v.Rewrites}}
  [plugins.cri.registry.mirrors."{{$k}}".rewrite]
{{range $pattern, $replace := $v.Rewrites}}
    "{{$pattern}}" = "{{$replace}}"
{{end}}
{{end}}
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.cri.registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.cri.registry.configs."{{$k}}".tls]
  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
{{end}}
{{end}}
{{end}}
{{range $k, $v := .ExtraRuntimes}}
[plugins.cri.containerd.runtimes."{{$k}}"]
  runtime_type = "{{$v.RuntimeType}}"
[plugins.cri.containerd.runtimes."{{$k}}".options]
  BinaryName = "{{$v.BinaryName}}"
{{end}}

(Note, this is just this, but with SystemdCgroup = {{ .SystemdCgroup }} replaced with SystemdCgroup = true)

sudo systemctl restart k3s
Done

So, where does that leave us?

Well, according to the upstream issue, systemd detection should be fixed soon upstream such that it soon defaults to the right containerd cgroup driver.. which presumably will cause the kubelet to do the right thing too. That "soon" though is next release (maybe a month? I don't know the exact release schedule), not the just-released 1.24.3.

In the meanwhile, I think the easiest resolution for us is probably to drop --kubelet-arg=cgroup-driver=systemd from the module, and wait on upstream to default to that for us.

The original motivation for cgroup-driver=systemd was because with docker, it was easy to mismatch drivers, and that would lead to k3s failing entirely. Now, even without that, k3s starts up and appears to function.

I'll put up a PR with that change, and hopefully that fixes things here! I'm optimistic since I do think I'm observing the same issue in my repro, and this does seem to fix it for me

Setting `cgroup-driver=systemd` was originally necessary to match with docker, else the kubelet would not start (NixOS#111835) However, since then, docker support has been dropped from k3s (NixOS#177790). As such, this option is much less necessary. More importantly, it now seems to be actively causing issues. Due to an upstream k3s bug, it's resulting in the kubelet and containerd having different cgroup drivers, which seems to result in some difficult to debug failure modes. See NixOS#181790 (comment) for a description of this problem. Removing this flag entirely seems reasonable to me, and it results in k3s working again on my machine.

superherointj · 2022-07-19T09:58:57Z

@collinarnett Can you test from master now?

superherointj · 2022-07-19T10:12:51Z

@euank

First, thanks for coming to save the day. Your expertise is always appreciated. :-)

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

collinarnett · 2022-07-19T13:16:55Z

@collinarnett Can you test from master now?

$ k get deployments -n kube-system
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
local-path-provisioner   1/1     1            1           2m8s
coredns                  1/1     1            1           2m8s
metrics-server           1/1     1            1           2m7s
traefik                  1/1     1            1           105s

Thank you all for your help 😄 At some point I hope I can help with these issues since I think Kubernetes on nixos is the way to go. I might also get a discussion about a matrix channel started on discourse since there seems to be small cluster of kube-nix people lurking around.

If there is no more discussion to be had on this issue feel free to close it.

bryanasdev000 · 2022-07-19T13:47:13Z

@euank

First, thanks for coming to save the day. Your expertise is always appreciated. :-)

Would you consider some improvement to the k3s testing to avoid this from happening again? Somehow we should be checking for this situation of the cluster not working properly.

If we have network in NixOS tests, maybe using some greps or jq to check pods status, deploy some NGINX pods and check connectivity between pods/services.

euank · 2022-07-20T01:02:04Z

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

Yup, absolutely, we should have tests that would catch this.

I keep meaning to write a reasonable multi-node test, but keep not finding the time... If anyone wants to write one, I'd be happy to review, and if not, I'll probably eventually get to it 😅

For this one, I think any test that ran "long enough" would work, since I think the single-node test we have didn't catch it just because it ran so quickly, before the kubelet got the chance to poll and decide to recreate the pod.

bryanasdev000 · 2022-07-20T02:06:58Z

Would you consider some improvement to the k3s testing to avoid this from happening again?
Somehow we should be checking for this situation of the cluster not working properly.

Yup, absolutely, we should have tests that would catch this.

I keep meaning to write a reasonable multi-node test, but keep not finding the time... If anyone wants to write one, I'd be happy to review, and if not, I'll probably eventually get to it sweat_smile

For this one, I think any test that ran "long enough" would work, since I think the single-node test we have didn't catch it just because it ran so quickly, before the kubelet got the chance to poll and decide to recreate the pod.

We could use the best solution to any race condition, sleep!

Seriously now, we can:

Setup and start k3s
Wait X minutes and poll base components status
Deploy some pods to check if we can schedule and check its stats too
Check DNS/Network trying to speak to these pods and maybe API Server too

bryanasdev000 · 2022-07-20T02:12:20Z

Now talking about hacks, how "ugly" is to have Network access in NixOS tests?

euank · 2022-07-22T09:18:43Z

poll base components status

That ends up being a bit of a challenge, since running base components naively requires networking to download all those images.

The existing test right now just skips all those components for that reason:

nixpkgs/nixos/tests/k3s-single-node.nix

Line 52 in 9731530

    
           services.k3s.extraFlags = "--no-deploy coredns,servicelb,traefik,local-storage,metrics-server --pause-image test.local/pause:local";

There is a potential solution there though: k3s upstream has an "airgapped images" tarball, and I think we could fetch that into the nix store, which we can then access in the test VM and then ctr image import all the images.

Anyway, I did get a multi-node test written (over here #182445), but I don't have much confidence it actually would have definitely caught this issue, though it should get flannel issues, and anything that would crashloop earlier / quicker.

collinarnett added the 0.kind: bug Something is broken label Jul 17, 2022

superherointj assigned euank Jul 18, 2022

superherointj mentioned this issue Jul 19, 2022

k3s: 1.24.2+k3s2 -> 1.24.3+k3s1 #182050

Merged

euank mentioned this issue Jul 19, 2022

nixos/k3s: use default cgroup-driver again #182076

Merged

12 tasks

superherointj closed this as completed Jul 19, 2022

superherointj changed the title ~~All pods crashing with latest k3s~~ k3s: All pods crashing with latest k3s Jul 19, 2022

superherointj changed the title ~~k3s: All pods crashing with latest k3s~~ k3s: All pods crashing with latest version Jul 19, 2022

euank mentioned this issue Jul 22, 2022

nixos/tests/k3s: add multi-node test, test basic flannel networking #182445

Merged

13 tasks

superherointj added the 6.topic: k3s Kubernates distribution (https://k3s.io/) label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s: All pods crashing with latest version #181790

k3s: All pods crashing with latest version #181790

collinarnett commented Jul 17, 2022

nixos-discourse commented Jul 18, 2022

superherointj commented Jul 18, 2022 •

edited

Loading

collinarnett commented Jul 18, 2022

bryanasdev000 commented Jul 18, 2022 •

edited

Loading

bryanasdev000 commented Jul 18, 2022 •

edited

Loading

bryanasdev000 commented Jul 19, 2022 •

edited

Loading

bryanasdev000 commented Jul 19, 2022

euank commented Jul 19, 2022 •

edited

Loading

superherointj commented Jul 19, 2022

superherointj commented Jul 19, 2022 •

edited

Loading

collinarnett commented Jul 19, 2022

bryanasdev000 commented Jul 19, 2022

euank commented Jul 20, 2022

bryanasdev000 commented Jul 20, 2022

bryanasdev000 commented Jul 20, 2022

euank commented Jul 22, 2022

k3s: All pods crashing with latest version #181790

k3s: All pods crashing with latest version #181790

Comments

collinarnett commented Jul 17, 2022

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots

Additional context

Notify maintainers

Metadata

nixos-discourse commented Jul 18, 2022

superherointj commented Jul 18, 2022 • edited Loading

collinarnett commented Jul 18, 2022

bryanasdev000 commented Jul 18, 2022 • edited Loading

bryanasdev000 commented Jul 18, 2022 • edited Loading

bryanasdev000 commented Jul 19, 2022 • edited Loading

bryanasdev000 commented Jul 19, 2022

euank commented Jul 19, 2022 • edited Loading

superherointj commented Jul 19, 2022

superherointj commented Jul 19, 2022 • edited Loading

collinarnett commented Jul 19, 2022

bryanasdev000 commented Jul 19, 2022

euank commented Jul 20, 2022

bryanasdev000 commented Jul 20, 2022

bryanasdev000 commented Jul 20, 2022

euank commented Jul 22, 2022

superherointj commented Jul 18, 2022 •

edited

Loading

bryanasdev000 commented Jul 18, 2022 •

edited

Loading

bryanasdev000 commented Jul 18, 2022 •

edited

Loading

bryanasdev000 commented Jul 19, 2022 •

edited

Loading

euank commented Jul 19, 2022 •

edited

Loading

superherointj commented Jul 19, 2022 •

edited

Loading