-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The main thread hang at drop_gil due to FORCE_SWITCHING after daemon thread exited #96387
Comments
Reproducible exampleThis issue could be reproduced by: No need to__del__ Call self.s.connect(('127.0.0.1', 5678)), only initialize in init call self.s = socket.socket() And the scripts used are: runtime.py import socket
class TestUnclosedSocket:
def __init__(self):
self.s = socket.socket()
ins = TestUnclosedSocket() service.py import threading
import runtime
import sys
sys.setswitchinterval(0.001)
def calc():
sum = 0
for i in range(100000):
sum += i * i
def daemon_func():
while True:
calc()
if __name__ == '__main__':
threading.Thread(target=daemon_func, name="daemon", daemon=True).start()
calc() change GIL switch interval to 1ms to increase the probability here. gdb trace is
referance https://bugs.python.org/issue39877 |
This looks like a duplicate of #91414. |
Semi related to #87135 given it concerns daemon threads and finalizing. |
Excuse me, is this a question similar to # 87135? Is there a solution plan for this problem? If there is no solution plan in the short term, how can users avoid this problem. By directly calling os Exit () or change the thread into a coroutine, instead of using multithreading capability or other methods. Please give us a way that you think is best to avoid this problem |
See also issue #95820. |
You should avoid daemon threads. There are other ways to implement a similar feature in a safer approach:
Daemon threads cause various very complicated issues at Python exit. mod_wsgi is trying a different approach: don't attempt to shutdown Python cleanly, just exit the process immediately: GrahamDumpleton/mod_wsgi#730 |
In Python 3.12, first we call _PyRuntimeState_SetFinalizing():
Then we delete modules:
Python finalization evolves a lot in last years to attempt to make it more reliable. Please test Python 3.12, or at least Python 3.11 or 3.10. My notes about Python finalization: https://pythondev.readthedocs.io/finalization.html |
I fail to reproduce your issue with Python 3.9, 3.11 or 3.12. Can you please explain how to reproduce it? What is your OS? Is the hang more likely if the system is busy? |
wait a miniute please.I'll sort out the reproduced environmental information |
Your environment
Reproducible exampleThere are three files, test.sh,service.py,runtime.py test.sh call the service.py, service.py import runtime And the scripts used are: test.sh for i in `seq 1 100000`;
do
../target2/bin/python3.9 service.py
echo call cpython$i
done
runtime.py import socket
class TestUnclosedSocket:
def __init__(self):
self.s = socket.socket()
ins = TestUnclosedSocket() service.py import threading
import runtime
import sys
sys.setswitchinterval(0.001)
def calc():
sum = 0
for i in range(100000):
sum += i * i
def daemon_func():
while True:
calc()
if __name__ == '__main__':
threading.Thread(target=daemon_func, name="daemon", daemon=True).start()
calc() change GIL switch interval to 1ms to increase the probability here. The phenomenon is that the script printing is stuck,my attempts is stopped at 20th.
The trace is :
It will reappear after hundreds of attempts. I reproduce 5 miniutes ago |
Thanks for test.sh, I managed to reproduce the hang by running this script 6x times in parallel (in 6 terminal tabs) and doing other things on my computer. With the bug, the hang occurs in less than a minute. Less than 250 iterations if I recall correctly, on a Python debug build (Python 3.12).
Oh, you made an accurate and correct analysis, well done! I wrote PR #96869 to fix the issue. Can you please check that my change fix your hang using test.sh? Tell me if you need help to test the fix. Also, please review my PR, I'm not sure if I wrote the correct name in the changelog ;-) |
I think that's would fix my issue, and the name is corret, thank you :-) |
At Python exit, sometimes a thread holding the GIL can wait forever for a thread (usually a daemon thread) which requested to drop the GIL, whereas the thread already exited. To fix the race condition, the thread which requested the GIL drop now resets its request before exiting. take_gil() now calls RESET_GIL_DROP_REQUEST() before PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a race condition with drop_gil(). Issue discovered and analyzed by Mingliang ZHAO.
At Python exit, sometimes a thread holding the GIL can wait forever for a thread (usually a daemon thread) which requested to drop the GIL, whereas the thread already exited. To fix the race condition, the thread which requested the GIL drop now resets its request before exiting. take_gil() now calls RESET_GIL_DROP_REQUEST() before PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a race condition with drop_gil(). Issue discovered and analyzed by Mingliang ZHAO. (cherry picked from commit 04f4977)
…96869) (pythonGH-96941) At Python exit, sometimes a thread holding the GIL can wait forever for a thread (usually a daemon thread) which requested to drop the GIL, whereas the thread already exited. To fix the race condition, the thread which requested the GIL drop now resets its request before exiting. take_gil() now calls RESET_GIL_DROP_REQUEST() before PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a race condition with drop_gil(). Issue discovered and analyzed by Mingliang ZHAO. (cherry picked from commit 04f4977) (cherry picked from commit 6ff5471) Co-authored-by: Victor Stinner <vstinner@python.org>
…6941) At Python exit, sometimes a thread holding the GIL can wait forever for a thread (usually a daemon thread) which requested to drop the GIL, whereas the thread already exited. To fix the race condition, the thread which requested the GIL drop now resets its request before exiting. take_gil() now calls RESET_GIL_DROP_REQUEST() before PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a race condition with drop_gil(). Issue discovered and analyzed by Mingliang ZHAO. (cherry picked from commit 04f4977) (cherry picked from commit 6ff5471) Co-authored-by: Victor Stinner <vstinner@python.org>
changelog: https://docs.python.org/3.10/whatsnew/changelog.html#python-3-10-10-final Should fix some unexpected deadlocks during restarts via python/cpython#96387
changelog: https://docs.python.org/3.10/whatsnew/changelog.html#python-3-10-10-final Should fix some unexpected deadlocks during restarts via python/cpython#96387
Bug report
Main thread hang at
drop_gil
after daemon thread exited, my Python version is 3.9.2, but I think the same issue is exists in the Python 3.10 newest version.Rough analysis
In Python version 3.9.2, the
take_gil
is implemented as following:A daemon thread may call
SET_GIL_DROP_REQUEST
and then exits fromPyThread_exit_thread
, so this thread will not get GIL really. This happens if the main thread is on the finalizing procedure and it calls_PyRuntimeState_SetFinalizing
inPy_FinalizeEx
before the daemon thread gets GIL (The daemon thread wait 5ms at most before exit).After that, the flag
gil_drop_request
is set, but only the main thread is left. Then, if the main thread runs intodrop_gil
again, it will hang atFORCE_SWITCHING
:Reproducible example
This issue could be reproduced by:
And the scripts used are:
runtime.py
service.py
I change GIL switch interval to 1ms to increase the probability here.
The text was updated successfully, but these errors were encountered: