Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful #87135

Open
gpshead opened this issue Jan 19, 2021 · 29 comments
Open
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Jan 19, 2021

BPO 42969
Nosy @gpshead, @pitrou, @vstinner, @colesbury, @izbyshev, @jbms
PRs
  • gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization #28525
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-01-19.19:35:46.531>
    labels = ['interpreter-core', 'type-bug', '3.9', '3.10', '3.11']
    title = 'pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful'
    updated_at = <Date 2021-11-15.22:28:59.110>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2021-11-15.22:28:59.110>
    actor = 'colesbury'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2021-01-19.19:35:46.531>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42969
    keywords = ['patch', '3.2regression']
    message_count = 26.0
    messages = ['385288', '385289', '396669', '396679', '401886', '401922', '401958', '401959', '402218', '402219', '402220', '402264', '402266', '402474', '402475', '402487', '402509', '402519', '402556', '402558', '402559', '402560', '402571', '402608', '402684', '406363']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'pitrou', 'vstinner', 'colesbury', 'izbyshev', 'jbms']
    pr_nums = ['28525']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42969'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 19, 2021

    BACKGROUND

    PyThread_exit_thread() calls pthread_exit() and is in turn called from a variety of APIs as documented in the C-API doc update from https://bugs.python.org/issue36427.

    The pthread_exit() call was originally introduced as a way "resolve" a crashes from daemon threads during shutdown https://bugs.python.org/issue1856. It did that. That fix to that even landed in 2.7.8 but was rolled back before 2.7.9 due to a bug in an existing application it exposed at the time (did we miss the implications of that? maybe). It remained in 3.x.

    PROBLEM

    pthread_exit() cannot be used blindly by any application. All code in the threaded application needs to be on board with it and always prepared for any API call they make to potentially lead to thread termination. Quoting a colleague: "pthread_exit() does not result in stack unwind or local variable destruction". This means that any code up the stack from the ultimate pthread_exit() call that has process-wide state implications that did not go out of its way to register cleanup with pthread_cleanup_push() could lead to deadlocks or lost resources. Something implausible to assume that code does.

    We're seeing this happen with C/C++ code. Our C++ builds with -fno-exceptions so uncatchable exception based stack unwinding as some pthread_exit implementations may trigger does not happen (and cannot be guaranteed anyways, see bpo-42888). Said C/C++ code is calling back into Python from a thread and thus must use PyEval_RestoreThread() or similar APIs before performing Python C API calls. If the interpreter is being finalized from another thread... these enter a codepath that ultimately calls pthread_exit() leaving corrupt state in the process. In this case that unexpected thread simply disappearing can lead to a deadlock in our process.

    Fundamentally I do not believe the CPython VM should ever call pthread_exit() when non-CPython frames are anywhere in the C stack. This may mean we should never call pthread_exit() at all (unsure; but it'd be ideal).

    The documentation suggests that all callers in user code of the four C-APIs with the documented pthread_exit() caveats need auditing and pre-call _Py_IsFinalizing() API checks. But... I do not believe that would fix anything even if it were done. _Py_IsFinalizing() called without the GIL held means that it could change by the time the PyEval_RestoreThreads() API calls it internally do determine if it should exit the thread. Thus the race condition window would merely be narrowed, not eliminated. Not good enough.

    CURRENT WORKAROUND (Big Hammer)

    Change CPython to call abort() instead of pthread_exit() as that situation is unresolvable and the process dying is better than hanging, partially alive. That solution isn't friendly, but is better than being silent and allowing deadlock. A failing process is always better than a hung process, especially a partially hung process.

    SEMI RELATED WORK

    https://bugs.python.org/issue42888 - appears to be avoiding some PyThread_exit_thread() calls to stop some crashes due to libgcc_s being loaded on demand upon thread exit.

    @gpshead gpshead added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jan 19, 2021
    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 19, 2021

    C-APIs such as PyEval_RestoreThreads() are insufficient for the task they are asked to do. They return void, yet have a failure mode.

    They call pthread_exit() on failure today.

    Instead, they need to return an error to the calling application to indicate that "The Python runtime is no longer available."

    Callers need to act on that in whatever way is most appropriate to them.

    @gpshead gpshead changed the title pthread_exit & PyThread_exit_thread are harmful pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful Jan 19, 2021
    @vstinner
    Copy link
    Member

    See also bpo-44434: "_thread module: Remove redundant PyThread_exit_thread() call to avoid glibc fatal error: libgcc_s.so.1 must be installed for pthread_cancel to work".

    New changeset 45a78f9 by Victor Stinner in branch 'main':
    bpo-44434: Don't call PyThread_exit_thread() explicitly (GH-26758)
    45a78f9

    @vstinner
    Copy link
    Member

    See also a discussion about the usefulness of daemon threads:
    python-trio/trio#2046

    I'm more in favor of deprecating daemon threads (in any interpreter, not only in subinterpreters). The current implementation is too fragile. There are still corner cases like the one described in this issue.

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 15, 2021

    Another possible resolution would to simply make threads that attempt to acquire the GIL after Python starts to finalize hang (i.e. sleep until the process exits). Since the GIL can never be acquired again, this is in some sense the simplest way to fulfill the contract. This also ensures that any data stored on the thread call stack and referenced from another thread remains valid. As long as nothing on the main thread blocks waiting for one of these hung threads, there won't be deadlock.

    I have a case right now where a background thread (created from C++, which is similar to a daemon Python thread) acquires the GIL, and calls "call_soon_threadsafe" on an asycnio event loop. I think that causes some Python code internally to release the GIL at some point, after triggering some code to run on the main thread which happens to cause the program to exit. While Py_FinalizeEx is running, the call to "call_soon_threadsafe" completes on the background thread, attempts to re-acquire the GIL, which triggers a call to pthread_exit. That unwinds the C++ stack, which results in a call to Py_DECREF without the GIL held, leading to a crash.

    @vstinner
    Copy link
    Member

    Change CPython to call abort() instead of pthread_exit() as that situation is unresolvable and the process dying is better than hanging, partially alive. That solution isn't friendly, but is better than being silent and allowing deadlock. A failing process is always better than a hung process, especially a partially hung process.

    The last time someone proposed to always call abort(), I proposed to add a hook instead: I added sys.unraisablehook. See bpo-36829.

    If we adopt this option, it can be a callback in C, something like: Py_SetThreadExitCallback(func) which would call func() rather than pthread_exit() in ceval.c.

    --

    Another option would be to add an option to disable daemon thread.

    concurrent.futures has been modified to no longer use daemon threads: bpo-39812.

    It is really hard to write a reliable implementation of daemon threads with Python subintepreters. See bpo-40234 "[subinterpreters] Disallow daemon threads in subinterpreters optionally".

    There is already a private flag for that in subinterpreters to disallow spawning processes or threads: an "isolated" subintepreter. Example with _thread.start_new_thread():

        PyInterpreterState *interp = _PyInterpreterState_GET();
        if (interp->config._isolated_interpreter) {
            PyErr_SetString(PyExc_RuntimeError,
                            "thread is not supported for isolated subinterpreters");
            return NULL;
        }

    Or os.fork():

        if (interp->config._isolated_interpreter) {
            PyErr_SetString(PyExc_RuntimeError,
                            "fork not supported for isolated subinterpreters");
            return NULL;
        }

    See also my article on fixing crashes with daemon threads:

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 16, 2021

    Regarding your suggestion of adding a hook like Py_SetThreadExitCallback, it seems like there are 4 plausible behaviors that such a callback may implement:

    1. Abort the process immediately with an error.

    2. Exit immediately with the original exit code specified by the user.

    3. Hang the thread.

    4. Attempt to unwind the thread, like pthread_exit, calling pthread thread cleanup functions and C++ destructors.

    5. Terminate the thread immediately without any cleanup or C++ destructor calls.

    The current behavior is (4) on POSIX platforms (pthread_exit), and (5) on Windows (_endthreadex).

    In general, achieving a clean shutdown will require the cooperation of all relevant code in the program, particularly code using the Python C API. Commonly the Python C API is used more by library code rather than application code, while it would presumably be the application that is responsible for setting this callback. Writing a library that supports multiple different thread shutdown behaviors would be particularly challenging.

    I think the callback is useful, but we would still need to discuss what the default behavior should be (hopefully different from the current behavior), and what guidance would be provided as far as what the callback is allowed to do.

    Option (1) is highly likely to result in a user-visible error --- a lot of Python programs that previously exited successfully will now, possibly only some of the time, exit with an error. The advantage is the user is alerted to the fact that some threads were not cleanly exited, but a lot of previously working code is now broken. This seems like a reasonable policy for a given application to impose (effectively requiring the use of an atexit handler to terminate all daemon threads), but does not seem like a reasonable default given the existing use of daemon threads.

    Option (2) would likely do the right thing in many cases, but main thread cleanup that was previously run would now be silently skipped. This again seems like a reasonable policy for a given application to impose, but does not seem like a reasonable default.

    Option (3) avoids the possibility of crashes and memory corruption. Since the thread stack remains allocated, any pointers to the thread stack held in global data structures or by other threads remain valid. There is a risk that the thread may be holding a lock, or otherwise block progress of the main thread, resulting in silent deadlock. That can be mitigated by registering an atexit handler.

    Option (4) in theory would allow cleanup handlers to be registered in order to avoid deadlock due to locks held. In practice, though, it causes a lot of problems:

    • The CPython codebase itself contains no such cleanup handlers, and I expect the vast majority of existing C extensionss are also not designed to properly handle the stack unwind triggered by pthread_exit. Without proper cleanup handlers, this option reverts to option (5), where there is a risk of memory corruption due to other threads accessing pointers to the freed thread stack. There is also the same risk of deadlock as in option (3).
    • Stack unwinding interacts particularly badly with common C++ usage because the very first thing most people want to do when using the Python C API from C++ is create a "smart pointer" type for holding a PyObject pointer that handles the reference counting automatically (calls Py_INCREF when copied, Py_DECREF in the destructor). When the stack unwinds due to pthread_exit, the current thread will NOT hold the GIL, and these Py_DECREF calls result in a crash / memory corruption. We would need to either create a new finalizing-safe version of Py_DECREF, that is a noop when called from a non-main thread if _Py_IsFinalizing() is true (and then existing C++ libraries like pybind11 would need to be changed to use it), or modify the existing Py_DECREF to always have that additional check. Other calls to Python C APIs in destructors are also common.
    • When writing code that attempts to be safe in the presence of stack unwinding due to pthread_exit, it is not merely explicitly GIL-related calls that are a concern. Virtually any Python C API function can transitively release and acquire the GIL and therefore you must defend against unwind from virtually all Python C API functions.
    • Some C++ functions in the call stack may unintentionally catch the exception thrown by pthread_exit and then return normally. If they return back to a CPython stack frame, memory corruption/crashing is likely.
    • Alternatively, some C++ functions in the call stack may be marked noexcept. If the unwinding reaches such a function, then we end up with option (1).
    • In general this option seems to require auditing and fixing a very large amount of existing code, and introduces a lot of complexity. For that reasons, I think this option should be avoided. Even on a per-application basis, this option should not be used because it requires that every C extension specifically support it.

    Option (5) has the risk of memory corruption due to other threads accessing pointers to the freed thread stack. There is also the same risk of deadlock as in option (3). It avoids the problem of calls to Python C APIs in C++ destructors. I would consider this options strictly worse than option (3), since there is the same risk of deadlock, but the additional risk of memory corruption. We free the thread stack slightly sooner, but since the program is exiting soon anyway that is not really advantageous.

    The fact that the current behavior differs between POSIX and Windows is particularly unfortunate.

    I would strongly urge that the default behavior be changed to (3). If Py_SetThreadExitCallback is added, the documentation could indicate that the callback is allowed to terminate the process or hang, but must not attempt to terminate the thread.

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 16, 2021

    Regarding your suggestion of banning daemon threads: I happened to come across this bug not because of daemon threads but because of threads started by C++ code directly that call into Python APIs. The solution I am planning to implement is to add an atexit handler to prevent this problem.

    I do think it is reasonable to suggest that users should ensure daemon threads are exited cleanly via an atexit handler. However, in some cases that may be challenging to implement, and there is also the issue of backward compatibility.

    @vstinner
    Copy link
    Member

    PyThread_exit_thread() is exposed as _thread.exit() and _thread.exit_thread().

    PyThread_exit_thread() is only called in take_gil() (at 3 places in the function) if tstate_must_exit(tstate) is true. It happens in two cases:

    • (by design) at Python exit if a daemon thread tries to "take the GIL": PyThread_exit_thread() is called.

    • (under an user action) at Python exit if threading._shutdown() is interrupted by CTRL+C: Python (regular) threads will continue to run while Py_Finalize() is running. In this case, when a (regular) thread tries to "take the GIL", PyThread_exit_thread() is called.

    @vstinner
    Copy link
    Member

    I don't think that there is a "good default behavior" where Python currently calls PyThread_exit_thread().

    IMO we should take the problem from the other side and tries to reduce cases when Python can reach this case. Or even make it impossible if possible. For example, *removing* daemon threads would remove the most common case when Python has to call PyThread_exit_thread().

    I'm not sure how to make this case less likely when threading._shutdown() is interrupted by CTRL+C. This function can likely hang if a thread is stuck for whatever reason. It's important than an user is able to "interrupt" or kill a stuck process with CTRL+C (SIGINT). It's a common expectation on Unix, at least for me.

    Maybe threading._shutdown() should be less nice and call os._exit() in this case: exit *immediately* the process in this case. Or Python should restore the default SIGINT handler: on Unix, the default SIGINT handler immediately terminate the process (like os._exit() does).

    I don't think that abort() should be called here (raise SIGABRT signal), since the intent of an user pressing CTRL+C is to silently terminate the process. It's not an application bug, but an user action.

    @vstinner
    Copy link
    Member

    See also bpo-13077 "Windows: Unclear behavior of daemon threads on main thread exit".

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 20, 2021

    It looks like the _thread module does not actually expose PyThread_exit_thread --- the similarly named thread_PyThread_exit_thread just raises SystemExit.

    From a search in the codebase, it appears PyThread_exit_thread is currently used only to kill threads when they attempt to acquire the GIL during finalization.

    Also, if it is changed to no longer kill the thread, it would probably make sense to rename it, e.g. to PyThread_stop_thread_during_finalization.

    @vstinner
    Copy link
    Member

    It looks like the _thread module does not actually expose PyThread_exit_thread --- the similarly named thread_PyThread_exit_thread just raises SystemExit.

    Oh right, I was confused by the function name: "thread_PyThread_exit_thread()". It's a good thing that it's not exposed in Python :-)

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 23, 2021

    I believe jbms is right that pausing the threads is the only right thing to do when they see tstate_must_exit. The PR is likely correct.

    @gpshead gpshead added 3.9 only security fixes 3.11 only security fixes labels Sep 23, 2021
    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 23, 2021

    I suppose calling Py_Initialize, Py_FinalizeEx, then Py_Initialize again, then Py_FinalizeEx again in an embedding application, was already not particularly well supported, since it would leak memory.

    However, with this change it also leaks threads. That is a bit unfortunate, but I suppose it is just another form of memory leak, and the user can avoid it by ensuring there are no daemon threads (of course even previously, the presence of any daemon threads meant additional memory leaking).

    @vstinner
    Copy link
    Member

    I'm not comfortable with PR 28525 which always hang threads which attempt to acquire the GIL after Python exited.

    I would prefer to keep the current behavior by default, but give the ability to applications embedding Python to decide what to do.

    With my Py_SetThreadExitCallback(func) idea, PyThread_exit_thread() would call func() and then pthread_exit(0). Applications can hang threads, log a message, call abort(), or whatever else.

    I'm not comfortable with writing a portable function to "hang a thread". For example, I don't understand why PR 28525 processes Windows messages on hang threads.

    Well, it's a complex problem :-(

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 23, 2021

    A PR adding a Py_SetThreadExitCallback(func) API guaranteeing the callback is called before pthread_exit() would allow anyone wanting to deal with their deadlocks to register abort() or while(1) pause(); as their exit callback. I'd be okay with that. Anyone care to make a PR for that?

    What should it do when SetThreadExitCallback has already been called? Is that an error? Are the callbacks chained? In which order? If someone passes nullptr does that undo it (please no!). It is process global state that many things could wind up having an opinion on each with their own reason to require theirs to be the only one. I vote for returning an error if a callback has already been set. And not allowing unsetting a callback.

    What we'd do internally at work is always guarantee our codebase's early application startup code (because we have such a thing) calls that to setup whichever exit callback we deem appropriate for everyone instead of today's default deadlock potential.

    On pausing... agreed, it doesn't feel _comfortable_. To me when faced with a known potential deadlock situation the only comfortable thing is to abort() as a process dying is always more useful than process hanging (or worse, partially hanging).

    Sleeping on the problem, I don't really understand how while(1) pause(); is significantly different than pthread_exit() when it comes to deadlocks, as it seems like an instantly terminated thread having state that needs cleanup should lead to a similar outcome as a thread with stuff needing cleanup that is now stuck in an infinite pause loop. Neither of them is going to cleanup whatever (presumably a lock they hold) that leads something else to deadlock? I guess the difference is that the thread stack memory is at least not released back for potential reuse while other threads and pointers could still be referring to it when pausing as opposed to a pthread_exit?

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 23, 2021

    In general, I view hanging threads as the least bad thing to do when faced with either acquiring the GIL or not returning at all. There is a lot of existing usage of Python that currently poses a risk of random crashes and memory corruption while Python is exiting, and I would like to fix that.

    However, I would certainly recommend that code using the Python C API attempt to avoid threads getting to that state in the first place. I added a "finalize block" mechanism to that PR which is intended to provide a way to attempt to acquire the GIL in a way that ensures the GIL won't get hung. I would welcome feedback on that. A common use case for that API might be a non-Python created thread that wants to invoke some sort of asynchronous callback handler via Python APIs.

    For Python daemon threads that you control, you can avoid them hanging by registering an atexit function that signals them to exit and then waits until they do.

    vsteinner: Regarding processing the Windows messages, I updated the PR to include a link to the MSDN documentation that led me to think it was a good idea.

    vstinner: As for random code outside of Python itself that is using PyThread_exit_thread: although I suppose there are legitimate use cases for pthread_exit and _endthreadex, these functions are only safe with the cooperation of the entire call stack of the thread. Additionally, while pthread_exit and _endthreadex have similar behavior for C code, they don't have the same behavior for C++ code, and that difference seems like a likely source of problems. Finally, I would say Python itself does not guarantee that its call stacks safely cooperate with pthread_exit (at the very least, there are sure to be memory leaks). Therefore, I think Python should not encourage its use by leaving it as a non-deprecated public API. If a user wishes to terminate a thread, they can invoke pthread_exit or _endthreadex directly, ideally without any Python functions in the call stack, and can refer to the documentation of those functions to understand the implications.

    gps: The reasons I believe hanging the thread is better than pthread_exit:

    • pthread_exit essentially throws an exception. In theory that means you could do proper cleanup via C++ destructors and/or re-throwing catch blocks. But in practice existing extension code is not designed to do that, and it would be quite a complex task to modify it to do proper cleanup, and on Windows the cleanup wouldn't run anyway.
    • Additionally, throwing an exception means if there is a noexcept function in the call stack, the program terminates. We would need to document that you aren't allowed to call Python APIs if there is a noexcept function in the call stack. If you have a catch(...) in the call stack, then you may inadvertently catch the exception and return control back to Python at a point that assumes it owns the GIL, which will cause all sorts of havoc. We would likewise need to document that you can't have a non-rethrowing catch(...) block in the call stack (I believe pybind11 has some of those).
    • Throwing an exception also means C++ destructors run. pybind11 has a smart pointer type that holds a PyObject and whose destructor calls Py_DECREF. That causes a crash when pthread_exit unwinds the stack, since the thread doesn't own the GIL.

    Those are the additional problems specific to pthread_exit. As gps noted, there is the additional problem of memory corruption common to both pthread_exit and _endthreadex:

    • Data structures that are accessible from other threads may contain pointers to data on the thread's stack. For example, with certain types of locks/signalling mechanisms it is common to store a linked list node on the stack that as then added to a list of waiting threads. If we destroy the thread stack without proper cleanup (and that proper cleanup definitely won't happen with _endthreadex, and probably in most cases still won't happen with pthread_exit), the data structure has now become corrupted.

    I don't think hanging the thread really increases the risk of deadlock over the status quo. In theory someone could have a C++ destructor that does some cleanup that safely pevents deadlock, but that is not portable to Windows, and I expect that properly implemented pthread_exit-safe code is extremely rare.

    I think we would want to ensure that Python itself is implemented in such a way as to not deadlock if Python-created threads with only Python functions in the call stack hang. Mostly that would amount to not holding mutexes while calling functions that may transitively attempt to acquire the GIL (or release and then re-acquire the GIL). That is probably a good practice for avoiding deadlock even when not finalizing.

    We would also want to document that external code using the Python API should be safe from deadlock if a thread hangs, or should use the new "finalize block" mechanism to ensure that it doesn't hang, but I would say that is much easier to achieve than pthread_exit/_endthreadex safety, and I expect is already satisfied by most code.

    Regarding vstinner's point that we would leak hung threads in an application that embeds Python and keeps running after Py_FinalizeEx:
    What are the existing use cases for initializing/finalizing the interpreter multiple times? Under the status quo we leak a bunch of memory when finalizing, e.g. because any Python object references held by the threads that are killed will be leaked, among many other things. That would seem to rule out its suitability for use in production code. At best it might be suitable for test code. Leaking a hung thread basically just amounts to leaking the thread stack. Additionally, if the thread is not created by Python, and the application is not exiting, then only the application should decide whether to kill one of its threads.

    Adding a Py_SetThreadExitCallback(func) function seems reasonable, but I would propose that the only acceptable behaviors are: abort, quick exit, and hang. I don't think we should promise that Python's own code base is safe for use with pthread_exit / _endthreadex, nor should we require that extension code is safe for that. Furthermore, abort would seem to only be useful if you have already taken measures (such as signaling daemon threads to exit and using the "finalize block" mechanism) to ensure it is never reached, and it just serves as an additional sanity check.

    @vstinner
    Copy link
    Member

    In this case that unexpected thread simply disappearing can lead to a deadlock in our process.

    This problem also remains me the very complex case of bpo-6721: "Locks in the standard library should be sanitized on fork". The issue title looks simple, but 12 years after the issue was created, it's still open.

    This issue is being solved by adding atfork callbacks to modules which must do something at fork in the child process (sometimes also in the parent process).

    I added threading.Lock._at_fork_reinit() private method to simplify the implementation of these callbacks.

    Such problem has no silver bullet solution, so it's better to let developers design their own solution with their specific requirements.

    @vstinner
    Copy link
    Member

    Gregory P. Smith: Python has many API using callbacks:

    PEP-445 added PyMem_SetAllocator() to set memory allocator. Adding PyMem_GetAllocator() also made possible to chain allocators and to "hook" into an existing allocator to execute code before and after it's called (the PEP contains an example).

    I'm not sure if it's important or useless to chain callbacks with Py_SetThreadExitCallback(). I suggest to always override the previously set callback.

    It would matter if library A sets a callback to emit log and library B sets a callback to hang threads. It maybe be nice to first emit a log and then hang the thread. But then the order in which callbacks are set starts to matter a lot :-)

    I'm fine with adding Py_GetThreadExitCallback() if you consider that it matters.

    If someone passes nullptr does that undo it (please no!).

    I don't think that we should bother with adding a special case. I prefer to consider developers as adults and let them make their own mistakes if they consider that they understand the code well enough ;-)

    _PyEval_SetTrace() allows to remove the current trace function. It's a legit use case.

    If library C is annoyed by library A and library B installed annoying callbacks, IMO it's also ok to let it "remove" the previously set callback, no?

    IMO Py_SetThreadExitCallback(NULL) should simply set the callback to NULL, so restore the default behavior: call pthread_exit().

    @vstinner
    Copy link
    Member

    Another example where a developer asks to call abort() to notice bugs, whereas Python previously silently ignored it: bpo-36829. Calling abort() is a legit use case, but not really the best default behavior. Again, the problem was solved by letting developers setting their own callback: sys.unraisablehook.

    If I understood correctly, pytest doesn't override it but "took" into the default implementation: it chains its own code with the default implementation. It's possible because there is a way to "get" the current hook: just read sys.unraisablehook ;-)

    Another argument in favor of also adding Py_GetThreadExitCallback() ;-)

    @vstinner
    Copy link
    Member

    Jeremy Maitin-Shepard: "In general, I view hanging threads as the least bad thing to do when faced with either acquiring the GIL or not returning at all. There is a lot of existing usage of Python that currently poses a risk of random crashes and memory corruption while Python is exiting, and I would like to fix that."

    Showing warnings by default or not was discussed many times in Python. It was decided to *hide* DeprecationWarning by default. The PEP-565 is a minor trade-off to show them in the __main__ module.

    For me, more generally, Python default behavior is designed for *users* who don't want to be annoyed by warnings or anything which would make sense for *developers*. That's why I designed a new "Python Development Mode" (-X dev):
    https://docs.python.org/dev/library/devmode.html

    Maybe in development mode, the behavior could be changed to call abort(). But honestly, I'm not really excited by that. I'm not embedding Python in a C++ application. I'm almot only use Python directly: the Unix command "python3". For this use case, IMO it's fine to call pthread_exit() by default.

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 25, 2021

    If we add a callback API for this, I agree with vstinner's https://bugs.python.org/issue42969#msg402558 commentary on the other callback APIs. We can do this one similarly and keep it simple. - Why? It's the initial simplicity that provides those running into this problem a way to solve their problem today. Without getting hung up on the details of the new default behavior.

    That said the only conclusion I'm coming to for what the safest behavior of Python is in this situation is to hang the threads, effectively as PR 28525 is proposing.

    So even if we added the callback API, I'd expect code using Python with C++ where these issues might ever crop up to always register a thread hanging callback. Which really suggests it is the good default. So do we even need the callback?

    ...But lets put the default behavior choice aside for a moment, there's something valuable regardless...

    There are New C APIs to enhance PyGILState_Ensure proposed in PR 28525 to do with GIL acquisition from threads. These appear useful as is. They provide a way for threads to discover that they will never be able to get the GIL and re-enter the Python interpreter. Rather than today's unexpected behavior of PyGILState_Ensure mercilessly terminating their thread, or given a Callback API whatever effect such a Callback API would have. That allows code to be written with pre-problem-detection that avoids entering this point of no return state. That is good for everyone. We should add those "GIL-curious" APIs no matter what. This bit could become its own PR separate from the other parts.

    If we cannot agree that blocking a non-main daemon-or-C/C++ thread forever that is guaranteed to not acquire the GIL because the Python interpreter is going away is the right default behavior instead of blindly killing it under the unsupportable assumption that it has nothing to clean up: We can proceed with such a callback API. I'm having a hard time imagining any other behavior that makes sense, so I'd expect lots of Python extension interface code to start carrying a copy of an implementation of a hang callback and sporting a Py_SetThreadExitCallback(xxx) call in their module Init function. (especially if pybind11 finds needs to generate this as boilerplate)

    I think the pthread_exit() call added in bpo-1856's https://hg.python.org/cpython/rev/c892b0321d23 were well intentioned, but not aware of the full unsupportable ramifications of that API call. That the attempt at releasing a backport of the pthread_exit call to 2.7.8 broke an application (that was admittedly doing something unwise) and had to be reverted as 2.7.9 was a sign for us to revisit this in 3.x releases as well. Which we're finally doing here.

    @jbms
    Copy link
    Mannequin

    jbms mannequin commented Sep 26, 2021

    Yes, I would agree that the new APIs are a useful addition regardless of the PyThread_exit_thread change.

    As far as the proposed Py_SetThreadExitCallback that seems like a fine thing for applications to use, as long as it doesn't impact how extensions need to be written to be safe from crashes/memory corruption. So for example if the default is to hang, then changing it to log and then hang, or optionally log and terminate the program, would be fine, since extensions aren't affected either way.

    Conversely, if one of the possible behaviors may be _endthreadex or pthread_exit, then libraries must be written to be safe under that behavior anyway, which is unfortunate. Furthermore, say for a library that only supports POSIX, maybe it is written to be safe under pthread_exit because it uses destructors to do cleanup, but then it will cause deadlock if the callback chooses to hang the thread instead. Thus, I think allowing the callback to change the behavior in a way that could impact extensions is not a great idea.

    The callback also doesn't seem like a very good mechanism for an extension that is incompatible with pthread_exit or _endthreadex, such as one using pybind11, to try to mitigate that problem, since an individual library shouldn't be changing application-wide behavior unless the library is specifically being used by the application for that purpose.

    @colesbury
    Copy link
    Contributor

    The pthread_exit behavior has been a problem for PyTorch and related libraries since Python 3.9. The PyTorch team has tried working around the problems without success (i.e. they keep getting bug reports involving crashes in PyEval_SaveThread/RestoreThread).

    The hang/paused the thread behavior suggested by jbms and gps seems like the only reliable option. This is also what the Java VM does when returning from native code and the JVM has exited.

    I believe it's not difficult to hang a thread in a cross-platform way: create a mutex, acquire it in the main thread (before setting PyRuntime._finalizing), never release it. Other threads can acquire that same mutex to block until the application exits.

    The crashes can occur even without daemon threads if the user presses ctrl-c while _thread_shutdown is running.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    tscmoo added a commit to facebookresearch/moolib that referenced this issue Sep 18, 2022
    This works around some nasty thread exit behavior from python, described in more detail at python/cpython#87135 .
    This behavior was the source of moolib crashes and deadlocks on exit.
    Luckily, we can essentially unload moolib in an atexit handler - this waits for any callbacks etc to finish, and prevents any more callbacks from being issued. Any python Rpc objects will still be valid, but closed (any operations on them will silently fail).
    CPython will also then not have an opportunity to exit our async threads, which would result in deadlocks.
    tscmoo added a commit to facebookresearch/moolib that referenced this issue Sep 18, 2022
    This works around some nasty thread exit behavior from python, described in more detail at python/cpython#87135 .
    This behavior was the source of moolib crashes and deadlocks on exit.
    Luckily, we can essentially unload moolib in an atexit handler - this waits for any callbacks etc to finish, and prevents any more callbacks from being issued. Any python Rpc objects will still be valid, but closed (any operations on them will silently fail).
    CPython will also then not have an opportunity to exit our async threads, which would result in deadlocks.
    @gpshead gpshead added the 3.12 bugs and security fixes label Sep 20, 2022
    @gpshead gpshead moved this to In Progress in Threading issues 🧵 May 25, 2023
    @gpshead gpshead self-assigned this May 25, 2023
    jbms added a commit to jbms/cpython that referenced this issue Jun 14, 2023
    jbms added a commit to jbms/cpython that referenced this issue Jun 14, 2023
    jbms added a commit to jbms/cpython that referenced this issue Jun 15, 2023
    @gpshead gpshead added 3.13 bugs and security fixes and removed 3.10 only security fixes 3.9 only security fixes labels Oct 3, 2023
    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 26, 2024

    I think #124149 and #123940 and related PR #124150 may be seeing this?

    gpshead added a commit that referenced this issue Oct 2, 2024
    …g finalization (GH-105805)
    
    Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option.
    
    Co-authored-by: Gregory P. Smith <greg@krypto.org>
    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 2, 2024

    merged for 3.14. Now considering if we can apply a version of that PR to bugfix releases such as 3.13.1 and 3.12.8.

    @gpshead gpshead added the 3.14 new features, bugs and security fixes label Oct 23, 2024
    @encukou
    Copy link
    Member

    encukou commented Jan 13, 2025

    @gpshead, do you have any update on the consideration?
    Should I dig into this?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    Status: In Progress
    Development

    No branches or pull requests

    4 participants