-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown tokio runtime gracefully when panic occurs in spawned future #535
Conversation
ddfed8a
to
b65f741
Compare
b65f741
to
80e181b
Compare
@leodasvacas, I tested this by panicking within a future spawned on the runtime. My experience testing:
|
Looks like you need to rebase this onto the new retry helper |
@leodasvacas I updated the shutdown process to send a shutdown signal to a thread that calls Issue with futures holding references: rust-lang/futures-rs#853 |
8879d0a
to
df8c829
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to see this revisited! The code doesn't quite reflect what we discussed (but it's close), let me know if you have questions, I'm also curious to know how your test cases went after this change.
graph/src/util/log.rs
Outdated
pub fn register_panic_hook(panic_logger: slog::Logger) { | ||
pub fn register_panic_hook( | ||
panic_logger: slog::Logger, | ||
shutdown_sender: Mutex<Option<oneshot::Sender<()>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the Mutex
and the Option
around this, and then construct the mutex inside this function, to simplify the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing, we should probably set the exit code to 1 inside the panic hook, since a panic means we'll terminate the process abnormally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the sleep and exit into the panic_hook, so that now it sends the shutdown signal to main, sleeps, and then exits the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler didn't like constructing the Mutex<Option<>>
inside of register_panic_hook()
., so I am still passing the it in as an input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be because you did it inside panic::set_hook
rather than right before it. But it's fine, this is not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated.
graph/src/util/log.rs
Outdated
@@ -69,5 +74,15 @@ pub fn register_panic_hook(panic_logger: slog::Logger) { | |||
); | |||
} | |||
}; | |||
|
|||
if let Ok(ref mut mutex) = shutdown_sender.lock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mutex may be unwrapped, since it will only error if a panic happens while holding the lock, but in this case that would be a double panic which will abort.
#620 caused conflicts. The |
df8c829
to
8e877eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, just a few extra comments. Sorry for the never-ending review here!
Exit process after running shutdown and waiting for 3 seconds to ensure complete shutdown of all workers.
570cc99
to
e342396
Compare
Adding this to the Hosted Service milestone, as we need this so that nodes don't get stuck in a broken state due to a transient panic |
Needs to be fmt'ed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a comment about dead code and this needs to be rustfmt'ed, otherwise it's good to go!
node/src/main.rs
Outdated
// Create components for tokio context: multi-threaded runtime, reactor reference, | ||
// executor context on the runtime, and Timer handle. | ||
let runtime = tokio::runtime::Runtime::new().expect("Failed to create runtime"); | ||
let reactor = tokio_reactor::Handle::current(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable unused? Looks like the whole tokio_reactor
dependency is unused now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep 👍, I've now removed the entire tokio_reactor
dependency.
e342396
to
0ab2119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some tokio-reactor
left standing.
graph/src/lib.rs
Outdated
@@ -21,7 +21,10 @@ extern crate slog_envlogger; | |||
extern crate slog_term; | |||
extern crate tiny_keccak; | |||
pub extern crate tokio; | |||
pub extern crate tokio_executor; | |||
pub extern crate tokio_reactor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dead code.
graph/Cargo.toml
Outdated
@@ -30,5 +30,8 @@ slog-envlogger = "2.1.0" | |||
slog-term = "2.4.0" | |||
tiny-keccak = "1.0" | |||
tokio = "0.1.11" | |||
tokio-executor = "0.1.5" | |||
tokio-reactor = "0.1.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency is dead.
Removed .expect()s from Shutdown sending instead logging failed or already sent status.
0ab2119
to
108b9c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good now, will be great to have this!
🎉 |
Nice! So this means that we abort the process now on panics? By the way, it looks like the two Tokio issues have comments on them |
Resolves #314
This PR replaces
tokio::run()
with a more custom build of a multi-threaded, Tokio runtime in order to allow panics in spawned futures to shutdown the entire runtime gracefully.A runtime is instantiated and provided the necessary components to setup a context for running our main function. Default components are setup for the context using the respective
<>::with_default()
method. Included with the runtime: reactor handle, executor context, and timer instance.The PR also updates the futures library and replaces its now deprecated
deadline()
function withtimeout()
.Issues created in tokio:
Tough to track down issue setting up the Timer: tokio-rs/tokio#727
Errors resulting from the fact that block_on does not set default runtime context components: tokio-rs/tokio#728