-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][compiled graphs] Wait for monitor thread to join before shutting down #48808
Conversation
This reverts commit bbc748a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
monitor.teardown(kill_actors=kill_actors) | ||
monitor.join(timeout=ctx.teardown_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the PR description,
To prevent this from happening, this fix joins the monitor thread before destructing the CoreWorker.
I don't understand when the CoreWorker
destruction happens and why calling monitor.join
in __del__
ensures that the monitor thread joins before the CoreWorker
is destructed.
I guess the CoreWorker will be terminated when ray.shutdown()
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CoreWorker is destructed in ray.shutdown()
, and the monitor thread could still be running at that point.
…ng down (ray-project#48808) Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread *before* destructing the CoreWorker. ## Related issue number Closes ray-project#48288. --------- Signed-off-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Connor Sanders <connor@elastiflow.com>
…ng down (ray-project#48808) Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread *before* destructing the CoreWorker. ## Related issue number Closes ray-project#48288. --------- Signed-off-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: hjiang <dentinyhao@gmail.com>
Why are these changes needed?
Each compiled graph starts a monitor thread to tear down the DAG upon detecting an error in one of the workers' task loops. Currently, during driver shutdown, this thread can live past the lifetime of the C++ CoreWorker. This causes a silent process exit when the thread later tries to call on the CoreWorker but it has already been destructed. To prevent this from happening, this fix joins the monitor thread before destructing the CoreWorker.
Related issue number
Closes #48288.