-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Check if python is exiting on a core worker on non-main Python thread #49547
Merged
Merged
Changes from 4 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
d579bcd
Exit because python is finalizing in check_signals
dayshah 0d94a6d
Exit because python is finalizing in check_signals
dayshah 42978aa
use sys.is_finalizing
dayshah 3e8b368
remove test teardown
dayshah e8147ef
uncomment old segfaulting test
dayshah 1ffe912
Merge branch 'master' into python-finalizing-exit
dayshah File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that the segfault occurs in
with gil
because it dereferences something that has already been released. Is it possible to avoid acquiring the GIL in that situation? I guess it is impossible to determine the state of the Python interpreter without the GIL. If it is impossible, I am fine with the current solution.It would be helpful if you could test it with a reproduction both with and without this PR so that we can know whether this solution helps or not because it's hard to write tests for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya there's no way to get python interpreter state without the gil, so the other option becomes reworking the signal checking in C++ as i attempted here #49319, but it'll require some more work to handle some edge-case tests where we rely on knowing the Python SystemExit state to exit out of the core worker.
Ideally this should fix, because
sys.is_finalizing
works on non-main thread and should allow us to exit out before python starts freeing the resources needed forwith gil
. But ya will try more to repro with vs. without this change, and update.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar logic used here
ray/python/ray/experimental/channel/common.py
Line 48 in 31592f9
from #47702. Only checking for system exit on ChannelTimeout though instead of each time through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's hard to reproduce, I'm fine with running a RayCG program hundreds of times to check if no segfault related to acquiring the GIL occurs.