Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(e2core): Deal with dead processes #429

Merged
merged 4 commits into from
May 3, 2023
Merged

Conversation

javorszky
Copy link
Contributor

@javorszky javorszky commented May 3, 2023

Related to #426

This is a quality of life improvement that deals with crashed processes and recovery.

When using the command.Start() function call, the process starts in the background. We're supposed to use the command.Wait() to determine when either the process finished, or exited for some other reason.

This changeset accommodates collecting the wait functions, starts them for each sub process, and when they unblock, their ports are added to a died map, which will then be removed automatically from watcher's bookkeeping.

I've tested this manually by:

  1. creating an image for e2core using make docker/dev
  2. starting up everything with se2's docker compose up (this is internal)
  3. start following the logs with docker compose logs e2core --follow (this is internal)
  4. making sure that I can execute an existing function
  5. hopping into the e2core docker container with docker exec -i -t a063f35d5c2b /bin/bash where the a06... is the image ID you get if you list the images with docker ps for e2core
  6. once inside, listing the process IDs with ls -l /proc/*/exe, which will have a similar output:
    lrwxrwxrwx 1 e2core e2core 0 May  3 17:20 /proc/1/exe -> /usr/local/bin/e2core
    lrwxrwxrwx 1 e2core e2core 0 May  3 17:21 /proc/16/exe -> /usr/local/bin/e2core
    lrwxrwxrwx 1 e2core e2core 0 May  3 17:31 /proc/38/exe -> /bin/bash
    lrwxrwxrwx 1 e2core e2core 0 May  3 17:33 /proc/self/exe -> /bin/ls
    lrwxrwxrwx 1 e2core e2core 0 May  3 17:33 /proc/thread-self/exe -> /bin/ls
    
  7. of those, the second e2core is the sub process with process id 16
  8. kill -9 16 to terminate that
  9. look at the logs to see that the termination and reaping of instance was completed and a new one was started during reconcile step
  10. check again that I can execute the same function by sending the post request to the local edge endpoint

Copy link
Contributor

@ospencer ospencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would probably call it a dead list instead of a died list, but that's not enough for me to block this change.

e2core/backend/satbackend/orchestrator.go Outdated Show resolved Hide resolved
Co-authored-by: Oscar Spencer <oscar@grain-lang.org>
@javorszky
Copy link
Contributor Author

I would probably call it a dead list instead of a died list

heck yeah smple past vs past participle! lemme change that around, it's smol effort

@javorszky javorszky merged commit beca41d into main May 3, 2023
@javorszky javorszky deleted the gabor/426-2-handle-cmd-wait branch May 3, 2023 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants