Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending MAINPID to systemd causes systemd to not restart k3s when started with --log #4189

Closed
brandond opened this issue Oct 11, 2021 · 3 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@brandond
Copy link
Member

brandond commented Oct 11, 2021

#4115 introduced a regression that causes K3s crashes to not be properly handled by systemd when k3s is started with the --log flag. Sending MAINPID to systemd breaks systemd's exit detection, as it stops watching the original pid, but is unable to watch the new pid as it is not a child of systemd itself. This means that k3s won't be restarted by systemd when it exits unexpectedly.

Since MAINPID causes other issues, and the child process can't notify for itself, the best we can do is just notify when execing the child process.

@galal-hussein
Copy link
Contributor

Tested on commit ID: 702fe24

  • Tested starting k3s as systemd service with the --log flag; note that logs are written to the specified log file.
  • Tested starting k3s as systemd service with the --log flag; note that it is restarted properly when killed.
root@ip-172-31-14-63:~# cat /etc/rancher/k3s/config.yaml 
log: /var/lib/rancher/k3s/k3s.log
tail -f /var/lib/rancher/k3s/k3s.log 
I1101 18:01:50.494896    4388 shared_informer.go:247] Caches are synced for attach detach 
I1101 18:01:50.534817    4388 controller.go:611] quota admission added evaluator for: endpointslices.discovery.k8s.io
I1101 18:01:50.561330    4388 shared_informer.go:247] Caches are synced for disruption 
I1101 18:01:50.561355    4388 disruption.go:371] Sending events to api server.
I1101 18:01:50.610834    4388 shared_informer.go:247] Caches are synced for resource quota 
I1101 18:01:50.629848    4388 shared_informer.go:247] Caches are synced for ReplicationController 
I1101 18:01:50.637386    4388 shared_informer.go:247] Caches are synced for resource quota 
I1101 18:01:51.097685    4388 shared_informer.go:247] Caches are synced for garbage collector 
I1101 18:01:51.113668    4388 shared_informer.go:247] Caches are synced for garbage collector 
I1101 18:01:51.113696    4388 garbagecollector.go:151] Garbage collector: all resource monitors have synced. Proceeding to collect garbage

@brandond
Copy link
Member Author

brandond commented Nov 1, 2021

Reopening due to regression noted at #4195 (comment)

@brandond brandond reopened this Nov 1, 2021
@galal-hussein
Copy link
Contributor

Validated with commit: aa33320

  • Tested starting k3s as systemd service with the --log flag; note that logs are written to the specified log file.
  • Tested starting k3s as systemd service with the --log flag; note that it is restarted properly when killed.
root@ip-172-31-14-63:~# cat /etc/rancher/k3s/config.yaml 
log: /var/lib/rancher/k3s/k3s.log
tail -f /var/lib/rancher/k3s/k3s.log 
I1102 20:00:29.370463   80776 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I1102 20:00:29.376954   80776 controller.go:85] Starting OpenAPI controller
I1102 20:00:29.377115   80776 naming_controller.go:291] Starting NamingConditionController
I1102 20:00:29.377224   80776 establishing_controller.go:76] Starting EstablishingController
I1102 20:00:29.377337   80776 nonstructuralschema_controller.go:192] Starting NonStructuralSchemaConditionController
I1102 20:00:29.377434   80776 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
I1102 20:00:29.377545   80776 crd_finalizer.go:266] Starting CRDFinalizer
I1102 20:00:29.382310   80776 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt"
I1102 20:00:29.382398   80776 dynamic_cafile_content.go:155] "Starting controller" name="request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt"
I1102 20:00:29.382711   80776 crdregistration_controller.go:111] Starting crd-autoregister controller
I1102 20:00:29.382730   80776 shared_informer.go:240] Waiting for caches to sync for crd-autoregister

I1102 20:00:29.463698   80776 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1102 20:00:29.464441   80776 apf_controller.go:317] Running API Priority and Fairness config worker
I1102 20:00:29.470058   80776 cache.go:39] Caches are synced for autoregister controller
I1102 20:00:29.470529   80776 cache.go:39] Caches are synced for AvailableConditionController controller
I1102 20:00:29.472937   80776 shared_informer.go:247] Caches are synced for cluster_authentication_trust_controller 
I1102 20:00:29.482790   80776 shared_informer.go:247] Caches are synced for crd-autoregister 
I1102 20:00:29.495751   80776 shared_informer.go:247] Caches are synced for node_authorizer 
E1102 20:00:29.872051   80776 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
I1102 20:00:30.383080   80776 storage_scheduling.go:148] all system priority classes are created successfully or already exist.
I1102 20:00:30.402514   80776 controller.go:132] OpenAPI AggregationController: action for item : Nothing (removed from the queue).
I1102 20:00:30.402555   80776 controller.go:132] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).

  • restarting k3s multiple times with --log option doesnt break k3s, also stopping systemd service appears to kill k3s properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants