Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown tokio runtime gracefully when panic occurs in spawned future #535

Merged
merged 12 commits into from
Dec 17, 2018

Conversation

fordN
Copy link
Contributor

@fordN fordN commented Oct 29, 2018

Resolves #314

This PR replaces tokio::run() with a more custom build of a multi-threaded, Tokio runtime in order to allow panics in spawned futures to shutdown the entire runtime gracefully.

A runtime is instantiated and provided the necessary components to setup a context for running our main function. Default components are setup for the context using the respective <>::with_default() method. Included with the runtime: reactor handle, executor context, and timer instance.
The PR also updates the futures library and replaces its now deprecated deadline() function with timeout().

Issues created in tokio:
Tough to track down issue setting up the Timer: tokio-rs/tokio#727
Errors resulting from the fact that block_on does not set default runtime context components: tokio-rs/tokio#728

@ghost ghost assigned fordN Oct 29, 2018
@fordN fordN force-pushed the ford/tokio-blockon branch from ddfed8a to b65f741 Compare October 29, 2018 22:56
@fordN fordN changed the title Shutdown tokio runtime gracefully when panic occurs in spawned thread Shutdown tokio runtime gracefully when panic occurs in spawned future Oct 29, 2018
@fordN fordN force-pushed the ford/tokio-blockon branch from b65f741 to 80e181b Compare October 30, 2018 07:49
@leoyvens
Copy link
Collaborator

@fordN is this still WIP? I imagine having direct access to the runtime and executor will be crucial to solving #314 but this PR isn't there yet, the only change in behaviour I could observe when testing is that we shutdown the runtime when idle, which never happens in practice.

@fordN
Copy link
Contributor Author

fordN commented Oct 30, 2018

@leodasvacas, I tested this by panicking within a future spawned on the runtime.

My experience testing:

  • Before the changes: the panic log would be written out (but not always) and the runtime would continue to hobble along.
  • After the changes: the panic is logged and the application shuts down immediately.

@timmclean
Copy link
Contributor

Looks like you need to rebase this onto the new retry helper

@fordN
Copy link
Contributor Author

fordN commented Nov 12, 2018

@leodasvacas I updated the shutdown process to send a shutdown signal to a thread that calls runtime.shutdown_now in order to stop all futures, and shutdown the threadpool and reactor. This update solves several of my test cases, but there are still situations where a future continues to run. I believe this is because tokio workers do not maintain a reference to the future in order to shutdown. I've found several related issues.

Issue with futures holding references: rust-lang/futures-rs#853
Brainstorm about tokio implementing a dedicated worker thread to run a reactor instance which among other things allows a worker to track all futures that it owns and drop incomplete futures when it shuts down: tokio-rs/tokio#424
Open PR for dropping incomplete tasks on shutdown: tokio-rs/tokio#722

@fordN fordN force-pushed the ford/tokio-blockon branch 2 times, most recently from 8879d0a to df8c829 Compare November 30, 2018 09:01
Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see this revisited! The code doesn't quite reflect what we discussed (but it's close), let me know if you have questions, I'm also curious to know how your test cases went after this change.

pub fn register_panic_hook(panic_logger: slog::Logger) {
pub fn register_panic_hook(
panic_logger: slog::Logger,
shutdown_sender: Mutex<Option<oneshot::Sender<()>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the Mutex and the Option around this, and then construct the mutex inside this function, to simplify the signature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, we should probably set the exit code to 1 inside the panic hook, since a panic means we'll terminate the process abnormally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the sleep and exit into the panic_hook, so that now it sends the shutdown signal to main, sleeps, and then exits the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler didn't like constructing the Mutex<Option<>> inside of register_panic_hook()., so I am still passing the it in as an input.

Copy link
Collaborator

@leoyvens leoyvens Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be because you did it inside panic::set_hook rather than right before it. But it's fine, this is not important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated.

@@ -69,5 +74,15 @@ pub fn register_panic_hook(panic_logger: slog::Logger) {
);
}
};

if let Ok(ref mut mutex) = shutdown_sender.lock() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex may be unwrapped, since it will only error if a panic happens while holding the lock, but in this case that would be a double panic which will abort.

node/src/main.rs Outdated Show resolved Hide resolved
@leoyvens
Copy link
Collaborator

leoyvens commented Dec 3, 2018

#620 caused conflicts. The Cargo.lock you'll want to just get the version on master.

@fordN fordN force-pushed the ford/tokio-blockon branch from df8c829 to 8e877eb Compare December 4, 2018 19:23
Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, just a few extra comments. Sorry for the never-ending review here!

graph/src/util/log.rs Outdated Show resolved Hide resolved
node/src/main.rs Outdated Show resolved Hide resolved
graph/src/util/log.rs Show resolved Hide resolved
graph/src/util/log.rs Outdated Show resolved Hide resolved
node/src/main.rs Show resolved Hide resolved
@fordN fordN force-pushed the ford/tokio-blockon branch from 570cc99 to e342396 Compare December 14, 2018 11:20
@timmclean timmclean added this to the Hosted Service milestone Dec 14, 2018
@timmclean
Copy link
Contributor

Adding this to the Hosted Service milestone, as we need this so that nodes don't get stuck in a broken state due to a transient panic

@leoyvens
Copy link
Collaborator

Needs to be fmt'ed.

Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a comment about dead code and this needs to be rustfmt'ed, otherwise it's good to go!

node/src/main.rs Outdated
// Create components for tokio context: multi-threaded runtime, reactor reference,
// executor context on the runtime, and Timer handle.
let runtime = tokio::runtime::Runtime::new().expect("Failed to create runtime");
let reactor = tokio_reactor::Handle::current();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this variable unused? Looks like the whole tokio_reactor dependency is unused now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep 👍, I've now removed the entire tokio_reactor dependency.

@fordN fordN force-pushed the ford/tokio-blockon branch from e342396 to 0ab2119 Compare December 14, 2018 23:29
Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some tokio-reactor left standing.

graph/src/lib.rs Outdated
@@ -21,7 +21,10 @@ extern crate slog_envlogger;
extern crate slog_term;
extern crate tiny_keccak;
pub extern crate tokio;
pub extern crate tokio_executor;
pub extern crate tokio_reactor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead code.

graph/Cargo.toml Outdated
@@ -30,5 +30,8 @@ slog-envlogger = "2.1.0"
slog-term = "2.4.0"
tiny-keccak = "1.0"
tokio = "0.1.11"
tokio-executor = "0.1.5"
tokio-reactor = "0.1.6"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is dead.

@fordN fordN force-pushed the ford/tokio-blockon branch from 0ab2119 to 108b9c4 Compare December 17, 2018 08:44
Copy link
Collaborator

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good now, will be great to have this!

@Jannis
Copy link
Contributor

Jannis commented Dec 17, 2018

🎉

@fordN fordN merged commit e1b2582 into master Dec 17, 2018
@timmclean
Copy link
Contributor

Nice! So this means that we abort the process now on panics?

By the way, it looks like the two Tokio issues have comments on them

@fordN fordN deleted the ford/tokio-blockon branch January 17, 2019 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants