fix(e2core): Deal with dead processes #429

javorszky · 2023-05-03T17:42:17Z

Related to #426

This is a quality of life improvement that deals with crashed processes and recovery.

When using the command.Start() function call, the process starts in the background. We're supposed to use the command.Wait() to determine when either the process finished, or exited for some other reason.

This changeset accommodates collecting the wait functions, starts them for each sub process, and when they unblock, their ports are added to a died map, which will then be removed automatically from watcher's bookkeeping.

I've tested this manually by:

creating an image for e2core using make docker/dev
starting up everything with se2's docker compose up (this is internal)
start following the logs with docker compose logs e2core --follow (this is internal)
making sure that I can execute an existing function
hopping into the e2core docker container with docker exec -i -t a063f35d5c2b /bin/bash where the a06... is the image ID you get if you list the images with docker ps for e2core

once inside, listing the process IDs with ls -l /proc/*/exe, which will have a similar output:

lrwxrwxrwx 1 e2core e2core 0 May  3 17:20 /proc/1/exe -> /usr/local/bin/e2core
lrwxrwxrwx 1 e2core e2core 0 May  3 17:21 /proc/16/exe -> /usr/local/bin/e2core
lrwxrwxrwx 1 e2core e2core 0 May  3 17:31 /proc/38/exe -> /bin/bash
lrwxrwxrwx 1 e2core e2core 0 May  3 17:33 /proc/self/exe -> /bin/ls
lrwxrwxrwx 1 e2core e2core 0 May  3 17:33 /proc/thread-self/exe -> /bin/ls

of those, the second e2core is the sub process with process id 16
kill -9 16 to terminate that
look at the logs to see that the termination and reaping of instance was completed and a new one was started during reconcile step
check again that I can execute the same function by sending the post request to the local edge endpoint

ospencer

LGTM. I would probably call it a dead list instead of a died list, but that's not enough for me to block this change.

e2core/backend/satbackend/orchestrator.go

Co-authored-by: Oscar Spencer <oscar@grain-lang.org>

e2core/backend/satbackend/watcher.go

javorszky · 2023-05-03T17:55:00Z

I would probably call it a dead list instead of a died list

heck yeah smple past vs past participle! lemme change that around, it's smol effort

e2core/backend/satbackend/watcher.go

Deal with dead processes

4a780a3

ospencer approved these changes May 3, 2023

View reviewed changes

e2core/backend/satbackend/orchestrator.go Outdated Show resolved Hide resolved

Remove superfluous wording from log message

0f5d119

Co-authored-by: Oscar Spencer <oscar@grain-lang.org>

cohix reviewed May 3, 2023

View reviewed changes

e2core/backend/satbackend/watcher.go Show resolved Hide resolved

Rename died -> dead

6f12d5f

callahad reviewed May 3, 2023

View reviewed changes

e2core/backend/satbackend/watcher.go Outdated Show resolved Hide resolved

cohix approved these changes May 3, 2023

View reviewed changes

Stylistic fix on an error message

657fea8

javorszky merged commit beca41d into main May 3, 2023

javorszky deleted the gabor/426-2-handle-cmd-wait branch May 3, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2core): Deal with dead processes #429

fix(e2core): Deal with dead processes #429

javorszky commented May 3, 2023 •

edited by callahad

Loading

ospencer left a comment

javorszky commented May 3, 2023

fix(e2core): Deal with dead processes #429

fix(e2core): Deal with dead processes #429

Conversation

javorszky commented May 3, 2023 • edited by callahad Loading

ospencer left a comment

Choose a reason for hiding this comment

javorszky commented May 3, 2023

javorszky commented May 3, 2023 •

edited by callahad

Loading