Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Use SIGTERM instead of SIGKILL on PVF worker version mismatch #6981

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

Closes #6979

@s0me0ne-unkn0wn s0me0ne-unkn0wn added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. labels Mar 30, 2023
@mrcnski
Copy link
Contributor

mrcnski commented Mar 30, 2023

I think SIGTERM is not as forceful as SIGKILL, so worth mentioning that this depends on our signal handlers to work.

And is it worth adding a doc to the handlers that references SIGTERM, so it can be grepped? Edit: if we had this doc, then someone could find the handlers after reading this code.

@@ -428,7 +428,7 @@ pub(crate) fn kill_parent_node_in_emergency() {
// some corner cases, which is checked. `kill()` never fails.
let ppid = libc::getppid();
if ppid > 1 {
libc::kill(ppid, libc::SIGKILL);
libc::kill(ppid, libc::SIGTERM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is safer to try SIGTERM first and wait for the node to exit. If for some weird reason or bug it doesn't exit we could force terminate with SIGKILL .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the node continues to run, nothing bad should happen? The node will also take up to 60 seconds if there is a future stalled.

@s0me0ne-unkn0wn
Copy link
Contributor Author

I think SIGTERM is not as forceful as SIGKILL, so worth mentioning that this depends on our signal handlers to work.

And is it worth adding a doc to the handlers that references SIGTERM, so it can be grepped? Edit: if we had this doc, then someone could find the handlers after reading this code.

We're basing Polkadot code on Substrate, and using Substrate features including signal handling logic, so I believe it should be properly documented on the Substrate side. I wouldn't be surprised if it is already documented somewhere :)

@s0me0ne-unkn0wn s0me0ne-unkn0wn merged commit 632f1ee into master Mar 30, 2023
@s0me0ne-unkn0wn s0me0ne-unkn0wn deleted the s0me0ne/worker-version-sigterm branch March 30, 2023 19:41
ordian added a commit that referenced this pull request Apr 11, 2023
* master: (28 commits)
  Remove years from copyright notes (#7034)
  Onchain scraper in `dispute-coordinator` will scrape `SCRAPED_FINALIZED_BLOCKS_COUNT` blocks before finality (#7013)
  PVF: Minor refactor in workers code (#7012)
  Expose WASM bulk memory extension in execution environment parameters (#7008)
  Co #13699: Remove old calls (#7003)
  Companion for paritytech/substrate#13811 (#6998)
  PR review rules, include all rs files except weights (#6990)
  Substrate companion: Remove deprecated batch verification (#6999)
  Added `origin` to config for `universal_origin` benchmark (#6986)
  Cache `SessionInfo` on new activated leaf in `dispute-distribution` (#6993)
  Update Substrate to fix Substrate companions (#6994)
  Consolidate subsystem spans so they are all children of the leaf-activated root span (#6458)
  Avoid redundant clone. (#6989)
  bump zombienet version (#6985)
  avoid triggering unwanted room_id for the release notifs (#6984)
  Add crowdloan to SafeCallFilter (#6903)
  Drop timers for new requests of active participations (#6974)
  Use `SIGTERM` instead of `SIGKILL` on PVF worker version mismatch (#6981)
  Tighter bound on asset types teleported so that weight is cheaper (#6980)
  staking miner: less aggresive submissions (#6978)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A global overseer shutdown handle
4 participants