Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio.run sometimes hangs with cancelled subprocesses #125502

Closed
obeattie opened this issue Oct 15, 2024 · 6 comments
Closed

asyncio.run sometimes hangs with cancelled subprocesses #125502

obeattie opened this issue Oct 15, 2024 · 6 comments
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@obeattie
Copy link

obeattie commented Oct 15, 2024

Bug report

Bug description:

When using asyncio to launch subprocesses, and cancelling them before they are done, asyncio.run sometimes hangs forever.

I can reproduce this fairly reliably on both macOS and Linux, though you may need to run this script a few times to hit the hang, so run it in a loop (eg. for i in $(seq 1 10); do python repro.py; done):

import asyncio
import logging
from subprocess import PIPE


async def run_sleep():
    proc = await asyncio.create_subprocess_exec(
        "sleep",
        "0.002",
        stdout=PIPE,
    )
    await proc.communicate()


async def run_loop():
    print("-- run_loop")
    try:
        async with asyncio.timeout(1):
            pending = set[asyncio.Task]()
            while True:
                while len(pending) < 20:
                    pending.add(asyncio.create_task(run_sleep()))
                _, pending = await asyncio.wait(
                    pending, return_when=asyncio.FIRST_COMPLETED
                )
    except asyncio.TimeoutError:
        pass
    finally:
        print(f"-- exiting run_loop")


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    asyncio.run(run_loop())

When you hit this issue, you will see the program hangs forever instead of exiting. If I send SIGTERM, I can see a traceback like this, showing it's stuck waiting for coroutines to finish cancelling:

Traceback (most recent call last):
  File "/Users/obeattie/Desktop/repro.py", line 34, in <module>
    asyncio.run(run_loop())
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 193, in run
    with Runner(debug=debug, loop_factory=loop_factory) as runner:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 62, in __exit__
    self.close()
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 70, in close
    _cancel_all_tasks(loop)
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 205, in _cancel_all_tasks
    loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete
    self.run_forever()
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 641, in run_forever
    self._run_once()
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 1948, in _run_once
    event_list = self._selector.select(timeout)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/selectors.py", line 566, in select
    kev_list = self._selector.control(None, max_ev, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task cancelling name='Task-4049' coro=<run_sleep() running at /Users/obeattie/Desktop/repro.py:7> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:767]>
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task cancelling name='Task-4050' coro=<run_sleep() running at /Users/obeattie/Desktop/repro.py:7> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:767]>
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task cancelling name='Task-4048' coro=<run_sleep() running at /Users/obeattie/Desktop/repro.py:7> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:767]>

CPython versions tested on:

3.12

Operating systems tested on:

Linux, macOS

@obeattie obeattie added the type-bug An unexpected behavior, bug, or error label Oct 15, 2024
@obeattie
Copy link
Author

Interestingly, removing the stdout=PIPE seems to make the issue go away.

@graingert
Copy link
Contributor

the tasks seems to be getting stuck in

await transp._wait()

@graingert
Copy link
Contributor

graingert commented Oct 15, 2024

the problem occurs when asyncio.runners._cancel_all_tasks is run at an inopportune instant when connecting pipes:

This task gets cancelled:

self._loop.create_task(self._connect_pipes(waiter))

which means self._pending_calls is never run:

loop.call_soon(self._protocol.connection_made, self)
for callback, data in self._pending_calls:
loop.call_soon(callback, *data)
self._pending_calls = None

so when _try_finish appends self._call_connection_lost to self._pending_calls:

self._call(self._call_connection_lost, None)

call_connection_lost is never called, which means self._exit_waiters are never woken:

def _call_connection_lost(self, exc):
try:
self._protocol.connection_lost(exc)
finally:
# wake up futures waiting for wait()
for waiter in self._exit_waiters:
if not waiter.cancelled():
waiter.set_result(self._returncode)
self._exit_waiters = None
self._loop = None
self._proc = None
self._protocol = None

Here's a demo that hangs every time for me:

import sys
import inspect
import asyncio
from subprocess import PIPE


async def run_sleep():
    proc = await asyncio.create_subprocess_exec(
        "sleep",
        "0.002",
        stdout=PIPE,
    )
    await proc.communicate()


async def amain():
    loop = asyncio.get_running_loop()
    task = asyncio.current_task(loop)
    coro = task.get_coro()

    called_cancel = False

    def cancel_eventually():
        my_coro = coro
        while inspect.iscoroutine(my_coro.cr_await):
            my_coro = my_coro.cr_await
        if my_coro.cr_code is loop._make_subprocess_transport.__code__:
            print("_cancel_all_tasks")
            tasks = asyncio.all_tasks()
            for task in tasks:
                task.cancel()
        else:
            loop.call_soon(cancel_eventually)

    loop.call_soon(cancel_eventually)
    await run_sleep()


def main():
    asyncio.run(amain())


if __name__ == "__main__":
    sys.exit(main())

@graingert
Copy link
Contributor

graingert commented Oct 16, 2024

@obeattie you can work around this problem by using a TaskGroup:

import asyncio
import logging
from subprocess import PIPE


async def run_sleep():
    proc = await asyncio.create_subprocess_exec(
        "sleep",
        "0.002",
        stdout=PIPE,
    )
    await proc.communicate()


async def run_loop():
    print("-- run_loop")

    async def run_sleep_forever():
        while True:
            await run_sleep()

    try:
        async with asyncio.timeout(1), asyncio.TaskGroup() as tg:
            for _ in range(20):
                tg.create_task(run_sleep_forever())
    except TimeoutError:
        pass
    finally:
        print(f"-- exiting run_loop")


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    asyncio.run(run_loop())

@obeattie
Copy link
Author

Thanks for confirming this is an issue and for the workaround @graingert. I already wired up a workaround which is a bit different to this – though I am sure others who hit this will appreciate that until the issue is fixed.

@kumaraditya303 kumaraditya303 self-assigned this Oct 17, 2024
@kumaraditya303
Copy link
Contributor

Duplicate of #103847

@kumaraditya303 kumaraditya303 marked this as a duplicate of #103847 Oct 19, 2024
@kumaraditya303 kumaraditya303 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 19, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in asyncio Oct 19, 2024
@kumaraditya303 kumaraditya303 removed their assignment Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

4 participants