Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock while updating type cache (free-threading) #119525

Closed
colesbury opened this issue May 24, 2024 · 0 comments
Closed

Deadlock while updating type cache (free-threading) #119525

colesbury opened this issue May 24, 2024 · 0 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented May 24, 2024

Bug report

Deadlock when running pool_in_threads.py (from test_multiprocessing_pool_circular_import)

This was observed with the GIL enabled (PYTHON_GIL=1) in the free-threaded build. I'm not sure if it could happen when the GIL is disabled.

The problem is due to:

  • We are calling Py_DECREF(old_name) while holding the type lock. Even though the name is just a unicode object (simple destructor), it may try to acquire other locks due to the biased reference counting inter-thread queue (see thread 2's stack trace).
  • The type cache seqlock spins and doesn't ever release the GIL while spinning.

It's probably easier to move the Py_DECREF() outside of the lock than to adjust the seqlock.

Py_DECREF(old_name);

Thread 1 (holds GIL, spinning on type lock):

#0  0x00007f4c35093c9b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005607b1aeed82 in _Py_yield () at Python/lock.c:46
#2  0x00005607b1aef6e3 in _PySeqLock_BeginRead (seqlock=seqlock@entry=0x5607b1e6e5e4 <_PyRuntime+236068>) at Python/lock.c:515
#3  0x00005607b19a2c2f in _PyType_LookupRef (type=type@entry=0x20002782c10, name=name@entry='_handle_workers') at Objects/typeobject.c:5239
#4  0x00005607b19a9f15 in _Py_type_getattro_impl (type=0x20002782c10, name='_handle_workers', suppress_missing_attribute=suppress_missing_attribute@entry=0x0) at Objects/typeobject.c:5440
#5  0x00005607b19aa081 in _Py_type_getattro (type=<optimized out>, name=<optimized out>) at Objects/typeobject.c:5491

Thread 2 (holds type lock, waiting on GIL):

#5  0x00005607b1ac695a in PyCOND_TIMEDWAIT (us=<optimized out>, mut=0x5607b1e55778 <_PyRuntime+134072>, cond=0x5607b1e55748 <_PyRuntime+134024>) at Python/condvar.h:74
#6  take_gil (tstate=tstate@entry=0x5607b2deb250) at Python/ceval_gil.c:331
#7  0x00005607b1ac6f5b in _PyEval_AcquireLock (tstate=tstate@entry=0x5607b2deb250) at Python/ceval_gil.c:585
#8  0x00005607b1b0d376 in _PyThreadState_Attach (tstate=tstate@entry=0x5607b2deb250) at Python/pystate.c:2074
#9  0x00005607b1ac6fda in PyEval_AcquireThread (tstate=tstate@entry=0x5607b2deb250) at Python/ceval_gil.c:602
#10 0x00005607b1af8085 in _PySemaphore_Wait (sema=sema@entry=0x7f4c2f7fd2f0, timeout=timeout@entry=-1, detach=detach@entry=1) at Python/parking_lot.c:215
#11 0x00005607b1af81ff in _PyParkingLot_Park (addr=addr@entry=0x5607b1e56b70 <_PyRuntime+139184>, expected=expected@entry=0x7f4c2f7fd387, size=size@entry=1, timeout_ns=timeout_ns@entry=-1, park_arg=park_arg@entry=0x7f4c2f7fd390, detach=detach@entry=1) at Python/parking_lot.c:316
#12 0x00005607b1aeefe7 in _PyMutex_LockTimed (m=m@entry=0x5607b1e56b70 <_PyRuntime+139184>, timeout=timeout@entry=-1, flags=flags@entry=_PY_LOCK_DETACH) at Python/lock.c:112
#13 0x00005607b1aef0e8 in _PyMutex_LockSlow (m=m@entry=0x5607b1e56b70 <_PyRuntime+139184>) at Python/lock.c:53
#14 0x00005607b1a64332 in PyMutex_Lock (m=0x5607b1e56b70 <_PyRuntime+139184>) at ./Include/internal/pycore_lock.h:75
#15 _Py_brc_queue_object (ob=ob@entry='__subclasses__') at Python/brc.c:67
#16 0x00005607b1952246 in _Py_DecRefSharedDebug (o=o@entry='__subclasses__', filename=filename@entry=0x5607b1c23d8b "Objects/typeobject.c", lineno=lineno@entry=5179) at Objects/object.c:359
#17 0x00005607b1992747 in Py_DECREF (op='__subclasses__', lineno=5179, filename=0x5607b1c23d8b "Objects/typeobject.c") at ./Include/object.h:894
#18 update_cache (entry=entry@entry=0x5607b1e6e5e0 <_PyRuntime+236064>, name=name@entry='_handle_workers', version_tag=version_tag@entry=131418, value=value@entry=<classmethod at remote 0x20002037540>) at Objects/typeobject.c:5179
#19 0x00005607b1992792 in update_cache_gil_disabled (entry=entry@entry=0x5607b1e6e5e0 <_PyRuntime+236064>, name=name@entry='_handle_workers', version_tag=version_tag@entry=131418, value=value@entry=<classmethod at remote 0x20002037540>) at Objects/typeobject.c:5199
#20 0x00005607b19a2f13 in _PyType_LookupRef (type=type@entry=0x20002782c10, name=name@entry='_handle_workers') at Objects/typeobject.c:5312
#21 0x00005607b19a9f15 in _Py_type_getattro_impl (type=0x20002782c10, name='_handle_workers', suppress_missing_attribute=suppress_missing_attribute@entry=0x0) at Objects/typeobject.c:5440
#22 0x00005607b19aa081 in _Py_type_getattro (type=<optimized out>, name=<optimized out>) at Objects/typeobject.c:5491
#23 0x00005607b1954c1a in PyObject_GetAttr (v=v@entry=<type at remote 0x20002782c10>, name='_handle_workers') at Objects/object.c:1175

cc @DinoV

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels May 24, 2024
colesbury added a commit to colesbury/cpython that referenced this issue May 24, 2024
The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
colesbury added a commit that referenced this issue May 29, 2024
The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 29, 2024
…onGH-119527)

The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
(cherry picked from commit c22323c)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit that referenced this issue May 29, 2024
…119527) (#119746)

The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
(cherry picked from commit c22323c)

Co-authored-by: Sam Gross <colesbury@gmail.com>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…on#119527)

The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…on#119527)

The deadlock only affected the free-threaded build and only occurred
when the GIL was enabled at runtime. The `Py_DECREF(old_name)` call
might temporarily release the GIL while holding the type seqlock.
Another thread may spin trying to acquire the seqlock while holding the
GIL.

The deadlock occurred roughly 1 in ~1,000 runs of `pool_in_threads.py`
from `test_multiprocessing_pool_circular_import`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant