Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Zombie-ish processes created under agent #32363

Closed
andrewvc opened this issue Jul 14, 2022 · 3 comments · Fixed by #32393
Closed

[Heartbeat] Zombie-ish processes created under agent #32363

andrewvc opened this issue Jul 14, 2022 · 3 comments · Fixed by #32393
Assignees
Labels
bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@andrewvc
Copy link
Contributor

When running under elastic agent heartbeat sometimes receives a signal causing it to restart, leaving orphaned node processes from browser runs (they wind up re-parented to tini in our container). We should use either golang's prctl capability or some other strategy to ensure they are properly reaped.

image

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 14, 2022
@andrewvc andrewvc added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Jul 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@vigneshshanmugam
Copy link
Member

vigneshshanmugam commented Jul 19, 2022

For my own understanding, Is the issue here

  • Not propagating the kill signals to the child process from the parent process OR
  • Chromium process keeps hanging even after receiving those signals?

Playwright server process does handle these signals SIGINT, SIGTERM, SIGUP by default and should be cleaned up on restarts. Could we enable debug logs and check if we are getting these signals propagated to the child process?

Running Synthetics with DEBUG=pw:browser should print all browser process logs which would be helpful here.

andrewvc added a commit that referenced this issue Jul 22, 2022
…32393)

Fixes #32363 by instructing the linux kernel to automatically kill node subprocesses if their parents die. In testing it appears chromium always dies as well, although I'm not entirely sure why. Either chrome sets the right flags itself, or the death signal propagates. Either way, in testing this works very solidly.

We don't have sufficient automated test infrastructure to write a good automated test here, so this will have to reply on manual testing.
@andrewvc andrewvc removed their assignment Aug 1, 2022
@lucasfcosta lucasfcosta self-assigned this Aug 4, 2022
@lucasfcosta
Copy link
Contributor

May have found a problem on Mac. See #32393 (comment).

Just wanted to check if I missed anything on my tests, otherwise I'll open a separate issue for this.

chrisberkhout pushed a commit that referenced this issue Jun 1, 2023
…32393)

Fixes #32363 by instructing the linux kernel to automatically kill node subprocesses if their parents die. In testing it appears chromium always dies as well, although I'm not entirely sure why. Either chrome sets the right flags itself, or the death signal propagates. Either way, in testing this works very solidly.

We don't have sufficient automated test infrastructure to write a good automated test here, so this will have to reply on manual testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants