Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: supervise spawned tasks #4716

Merged
merged 8 commits into from
Sep 9, 2024

Conversation

0x009922
Copy link
Contributor

Description

This PR introduces a lightweight Supervisor for Tokio tasks.

What it does:

  • Monitors multiple children
  • Provides a single shutdown signal for everything
  • Supports graceful shutdown timeout before aborting a child
  • If a child panics, initiates shutdown and exits with an error
  • If a child exits before shutdown signal, also initiates shutdown and exits with an error. Note: this might not be always the desirable behaviour, but currently there are no other cases in Iroha. This behaviour could be easily extended to support refined per-child strategies.
  • Logs children's lifecycle and panics

What it doesn't:

  • Doesn't support restarting children. To implement that, we need a formal actor system.

Linked issue

Closes #4698
Related to #4516 (helps to identify failures)

Benefits

  • Clearer shutdown traces
  • Clearer Iroha lifecycle
  • Less unnoticed panics

@0x009922 0x009922 added Enhancement New feature or request Refactor Improvement to overall code quality labels Jun 11, 2024
@0x009922 0x009922 self-assigned this Jun 11, 2024
@github-actions github-actions bot added the api-changes Changes in the API for client libraries label Jun 11, 2024
@Erigara Erigara self-assigned this Jun 11, 2024
@0x009922 0x009922 removed the api-changes Changes in the API for client libraries label Jun 11, 2024
@Erigara
Copy link
Contributor

Erigara commented Jun 11, 2024

Like the interface.

However i find it a bit hard to track the logic of Supervisor itself maybe we can simplify it's implementation?
My idea is to handle all tasks inside SupervisorTask itself and reduce amount of task spawning, here is rough sketch of my idea.
What do you think?

torii/src/lib.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added the api-changes Changes in the API for client libraries label Jun 12, 2024
@0x009922
Copy link
Contributor Author

However i find it a bit hard to track the logic of Supervisor itself maybe we can simplify it's implementation?
My idea is to handle all tasks inside SupervisorTask itself and reduce amount of task spawning, here is rough sketch of my idea.
What do you think?

I agree that the current implementation of Supervisor is somewhat clunky and not easy to follow.

Tried to rewrite with less amount of spawns and using FuturesUnordered, as in your sketch, but it turned out to be also very confusing when it came to shutdown timeouts. Abandoned this way. If you can implement it in a way so that all current tests will pass, and the implementation will also be easy to understand - please, welcome =)

Anyway, I found a middle ground and made the implementation simpler. Now Supervisor relies on JoinSet, and for each monitored child it spawns 2 tasks: one to report the result of the child, and one to implement child shutdown logic. Supervisor itself waits until this JoinSet finishes all tasks, and also waits for messages from with children results to initiate shutdown if something went wrong.

Please re-review!

@0x009922 0x009922 requested a review from Erigara June 12, 2024 05:59
torii/src/lib.rs Outdated Show resolved Hide resolved
core/src/snapshot.rs Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
@Erigara
Copy link
Contributor

Erigara commented Jun 13, 2024

Anyway, I found a middle ground and made the implementation simpler.

Indeed, implementation became much simpler and easier to grasp, i like it.

futures/src/supervisor.rs Outdated Show resolved Hide resolved
Erigara
Erigara previously approved these changes Jun 17, 2024
Copy link
Contributor

@Erigara Erigara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Erigara Erigara dismissed their stale review June 17, 2024 11:01

Static check failing

Erigara
Erigara previously approved these changes Jun 18, 2024
@0x009922
Copy link
Contributor Author

Rebased & ran all tests locally - works except 3 extra functional.

@mversic mversic assigned DCNick3 and unassigned mversic Aug 26, 2024
futures/src/supervisor.rs Outdated Show resolved Hide resolved
futures/src/supervisor.rs Outdated Show resolved Hide resolved
core/src/block_sync.rs Outdated Show resolved Hide resolved
core/src/gossiper.rs Outdated Show resolved Hide resolved
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
@0x009922 0x009922 merged commit 23a7755 into hyperledger:main Sep 9, 2024
8 checks passed
@0x009922 0x009922 deleted the 4698-supervisor branch September 9, 2024 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-changes Changes in the API for client libraries Enhancement New feature or request Refactor Improvement to overall code quality
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Improve handling of task panics
4 participants