[release-1.21] Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4196

brandond · 2021-10-12T20:43:14Z

Backport fix for Sending MAINPID to systemd causes systemd to not restart k3s when started with --log to branch release-1.21 and engine-1.21

Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4189

galal-hussein · 2021-11-01T18:47:53Z

Testing with version: v1.21.6-rc2+k3s1

I am seeing an odd behavior:

add --log to k3s
restart k3s several times

you will get an error:

time="2021-11-01T18:46:13.292637985Z" level=info msg="Starting k3s v1.21.6-rc2+k3s1 (254d2f69)"
time="2021-11-01T18:46:13.294866287Z" level=info msg="Cluster bootstrap already complete"
time="2021-11-01T18:46:13.309286137Z" level=fatal msg="starting kubernetes: preparing server: init cluster datastore and https: listen tcp :6443: bind: address already in use"

even after k3s is stopped then a process will still be around, I dont see this behavior when --log is omitted, also the k3s server process is not being killed when I stop systemd service for k3s

galal-hussein · 2021-11-02T20:06:14Z

Validation with version: v1.21.6-rc3+k3s1

Tested starting k3s as systemd service with the --log flag; note that logs are written to the specified log file.
Tested starting k3s as systemd service with the --log flag; note that it is restarted properly when killed.

root@ip-172-31-14-63:~# cat /etc/rancher/k3s/config.yaml 
log: /var/lib/rancher/k3s/k3s.log

tail -f /var/lib/rancher/k3s/k3s.log 
I1102 20:00:29.370463   80776 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I1102 20:00:29.376954   80776 controller.go:85] Starting OpenAPI controller
I1102 20:00:29.377115   80776 naming_controller.go:291] Starting NamingConditionController
I1102 20:00:29.377224   80776 establishing_controller.go:76] Starting EstablishingController
I1102 20:00:29.377337   80776 nonstructuralschema_controller.go:192] Starting NonStructuralSchemaConditionController
I1102 20:00:29.377434   80776 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
I1102 20:00:29.377545   80776 crd_finalizer.go:266] Starting CRDFinalizer
I1102 20:00:29.382310   80776 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt"
I1102 20:00:29.382398   80776 dynamic_cafile_content.go:155] "Starting controller" name="request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt"
I1102 20:00:29.382711   80776 crdregistration_controller.go:111] Starting crd-autoregister controller
I1102 20:00:29.382730   80776 shared_informer.go:240] Waiting for caches to sync for crd-autoregister

I1102 20:00:29.463698   80776 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1102 20:00:29.464441   80776 apf_controller.go:317] Running API Priority and Fairness config worker
I1102 20:00:29.470058   80776 cache.go:39] Caches are synced for autoregister controller
I1102 20:00:29.470529   80776 cache.go:39] Caches are synced for AvailableConditionController controller
I1102 20:00:29.472937   80776 shared_informer.go:247] Caches are synced for cluster_authentication_trust_controller 
I1102 20:00:29.482790   80776 shared_informer.go:247] Caches are synced for crd-autoregister 
I1102 20:00:29.495751   80776 shared_informer.go:247] Caches are synced for node_authorizer 
E1102 20:00:29.872051   80776 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
I1102 20:00:30.383080   80776 storage_scheduling.go:148] all system priority classes are created successfully or already exist.
I1102 20:00:30.402514   80776 controller.go:132] OpenAPI AggregationController: action for item : Nothing (removed from the queue).
I1102 20:00:30.402555   80776 controller.go:132] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).

restarting k3s multiple times with --log option doesnt break k3s, also stopping systemd service appears to kill k3s properly

brandond added this to the v1.21.6+k3s1 milestone Oct 12, 2021

brandond self-assigned this Oct 12, 2021

brandond mentioned this issue Oct 18, 2021

[release-1.21] Fix systemd main pid handling when running with --log #4236

Merged

rancher-max assigned galal-hussein Oct 27, 2021

rancher-max added the kind/dev-validation Dev will be validating this issue label Oct 27, 2021

This was referenced Nov 1, 2021

[engine-1.21] Fix log/reap reexec #4375

Merged

[release-1.21] Fix log/reap reexec #4376

Merged

galal-hussein closed this as completed Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-1.21] Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4196

[release-1.21] Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4196

brandond commented Oct 12, 2021

galal-hussein commented Nov 1, 2021

galal-hussein commented Nov 2, 2021

[release-1.21] Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4196

[release-1.21] Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4196

Comments

brandond commented Oct 12, 2021

galal-hussein commented Nov 1, 2021

galal-hussein commented Nov 2, 2021