-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-118209: Add structured exception handling to mmap module #118213
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good so far, though as this is a draft I'll leave it with you for now.
One other possibility may be to protect PyBytes_FromString
internally, which could cut off an entire class of crashes instead of just one. Thoughts?
That's a possibility too. However I don't know if there's a big performance impact if the exception handling is used there. |
Native exceptions are zero cost until an exception is actually thrown, so I wouldn't worry about that. The compiler generates a static table that the OS refers to in order to figure out which handler to invoke. That said, a number of the other functions you mentioned won't go through that path, so there's probably going to be a need to add it directly into mmap, particularly for writing. Maybe a helper function to do a protected memcpy is the way to go? That would let us entirely scope the exception handling and make sure we aren't mismanaging any state inside of it (which is the usual issue). It also means we can have a plain implementation on non-Windows (for now) and keep the code that uses it looking fairly consistent. |
I think that's probably a good idea. However there are still some rogue pointer derefs (like the one-character optimization mentioned above) which can still throw. EDIT: This is out-of-scope for this PR. But Linux probably has similar issues. I couldn't find good docs regarding the situtation (I am more familiar with Windows), but Linux will probably send some signal like SIGBUS/SIGSEV in this scenario. Are you supposed to handle them when mmapping? |
I'm in the same position as you, and agree that it's out of scope for this PR. But if we set things up so that it's easy to multiplex per-platform, we'll at least set up someone who does know to be able to do it. What that means to me is that we want to contain |
Yes, on POSIX |
I used It will be very difficult to handle all the functions in this way though. For example there would need to be additional functions like |
rename HandlePageException to filter_page_exception Co-authored-by: Eryk Sun <eryksun@gmail.com>
By the way regarding the ntstatus to win32 error conversion # pip install ctypes-windows-sdk tqdm
from cwinsdk.um.winternl import RtlNtStatusToDosError
from cwinsdk.um.ntsecapi import LsaNtStatusToWinError
from tqdm import trange
for i in trange(-(2**31), 2**31):
a = RtlNtStatusToDosError(i)
b = LsaNtStatusToWinError(i)
if a != b:
raise AssertionError(f"NTSTATUS={i}, RtlNtStatusToDosError={a}, LsaNtStatusToWinError={b}") and it shows that the output of both functions is the same for the full range of ints. |
I forgot about that one. It's not exactly surprising that it behaves the same. 😉
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how this is looking. Let us know when you'd like CI run and we can hit the button to get it going, and feel free to start adapting other cases in this file.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
!buildbot .windows. |
🤖 New build scheduled with the buildbot fleet by @zooba for commit 9737f22 🤖 The command will test the builders whose names match following regular expression: The builders matched are:
|
I'm torn between just merging (to make beta 1) and being more thorough, but I think we should be more thorough. This can easily go in for beta 2, so the only urgency is to make sure we get good test coverage. |
Looks like we may need to use or switch to
(Possibly that condition has to be broken up, but I think it ought to work.) |
We can also simply remove |
Works for me. We'll get this in for beta 2 at this stage. |
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Congratulations! |
…dule (pythonGH-118213) (cherry picked from commit e85e8de) Co-authored-by: Dobatymo <Dobatymo@users.noreply.github.com>
GH-118887 is a backport of this pull request to the 3.13 branch. |
If
DONT_USE_SEH
is not defined, catch page errors and access violations when reading/writing the mmap pointer and convert it into a Python exception.Methods done:
mmap_read_method
mmap_read_byte_method
mmap_read_line_method
mmap_write_method
mmap_write_byte_method
mmap_item
mmap_subscript
mmap_ass_item
mmap_ass_subscript
mmap_move_method
mmap_gfind
(hopefully I didn't miss any resource handling here)Other comments:
LsaNtStatusToWinError
is equivalent toRtlNtStatusToDosError
(see comment below). So I am using this since it's easier to call.mmap_read_line_method
. Previously the file pointer would still be advanced in case of a memory allocation error.Issues with the PR:
mmap_resize_method
I don't know how to fix:
mmap_buffer_getbuf
: this creates a buffer object without actually reading from memory (yet). This will cause issues when the buffer is read in other parts of the code. The only way to fix this would be make all buffer methods SEH aware also... :(See