Propagating cancellation through threads #606

njsmith · 2018-08-16T10:32:22Z

While writing up #607, I was thinking about a neat feature that curio has. I'm not sure if it's very important, but I envy it a bit, so I might as well open an issue to dump thoughts out where we can see them.

The feature is: say you use run_sync_in_worker_thread to enter a thread, and then from the thread you use BlockingTrioPortal.run to re-enter trio. So conceptually, this is all one stack, that starts in trio, goes into a thread, and then goes back into trio. Now suppose the original run_sync_in_worker_thread is cancelled. It would be neat if this caused the code inside the BlockingTrioPortal.run call to raise a Cancelled error that propagated all the way back out of trio, through the thread, and back into trio.

See #607 for more details about why this is tricky, and hard to fix even by extending the current cancel API.

I guess either we need some way to reenter a given cancel binding (which feels very weird and unlikely to have other use cases), or else some way to capture the exception in run_sync_in_worker_thread (<- this part we have) and then inject it into the eventual task (<- this part we don't have). Well... unless we do it by making a new cancel scope inside the eventual task, cancel it, and then re-raise after the inner cancellation unwinds. (We could even fudge the traceback if we want to be really tricky.) Or... we could switch from attaching Cancelled objects to specific scopes/bindings, and make it so that each with block checks if it is associated with the outermost cancelled scope, and if so it catches all Cancelled exceptions, and then it would be fine to manually capture a Cancelled from inside the re-entry task and pass it over to the thread, before it gets caught. Except, ugh, that won't work if the exception isn't Cancelled but rather KeyboardInterrupt.

It wouldn't be terrible to manually cancel the re-entry task, then call the raise_cancel function to get the real exception, and attach the internal exception to it as a __context__ (though this would require #285).

Alternatively... can we avoid all this rigmorale? What if run_sync_in_worker_thread created a nursery, and when BTP.run re-entered it put tasks into that nursery, instead of the system nursery? Then any cancel scope that contained the run_sync_in_worker_thread call would also automatically contain any re-entry tasks.

KeyboardInterrupt might still require some careful handling, since it doesn't follow the normal cancel scope scoping rules. I guess we could do some shenanigans like pushing the actual wait-for-thread into a child task? Actually this would be pretty simple: open nursery, start a child to do the actual thread waiting stuff, then park in the nursery __aexit__ – KeyboardInterrupt will be delivered to __aexit__ and cause the nursery to be cancelled. (Actually, this is an interesting possible strategy for delivering KeyboardInterrupt in general: pick a nursery and inject it directly there, as if it were raised by a task inside that nursery. Or do this to the system nursery, though that feels weird.) I'd be a little nervous about adding overhead to run_sync_in_worker_thread, but maybe it wouldn't be that bad.

The text was updated successfully, but these errors were encountered:

Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others. I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing. An overview: * New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy." * Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child. * Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC). * `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s. * Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task. * `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition. Potential advantages: * Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point. * Nursery.start() is much simpler. * Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic). * Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach. * Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding. Caveats: * We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so. * There are probably bugs, because aren't there always? Current cancel logic: ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): incoming_deadline = inf if self._shield else parent_effective_deadline my_deadline = -inf if self._cancel_called else self._deadline return min(incoming_deadline, my_deadline), parent_extra_info ``` Want to support a grace period? I'm pretty sure it would work with something like ``` def compute_effective_deadline( self, parent_effective_deadline, parent_extra_info, task ): parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline) if self._shield: parent_effective_deadline = parent_cleanup_deadline = inf my_cleanup_start = min(self._deadline, self._cancel_called_at) merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period) my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline) if self._shield_during_cleanup: effective_deadline = merged_cleanup_deadline else: effective_deadline = min(parent_effective_deadline, my_cleanup_start) return effective_deadline, my_extra_info ``` Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)

njsmith · 2019-07-27T08:36:05Z

In retrospect, I think I was making this too complicated. The case where we want to propagate cancellation is when we're inside a trio.to_thread.run_sync worker thread. And in that case, we already have a task on the Trio side: the one that called to_thread.run_sync. It already has the right cancellation context. And it's just sitting around doing nothing, waiting for the thread. So we might as well give it something to do.

Basically, after spawning the thread, it would sit in a loop like:

while True:
    msg_from_thread = await wait_task_rescheduled(...)
    if msg_from_thread.type is THREAD_DONE:
        return msg_from_thread.result.unwrap()
    elif msg_from_thread.type is RUN:
        result = await outcome.acapture(msg_from_thread.async_fn)
        to_thread_queue.put(result)
    ...

We'd stash the communications channels in thread-local variables, like we do the trio token, and then the thread would send messages back to report when it's finished, or when it has some work it needs to do in trio context.

Interesting special cases that we'll want to make sure we handle correctly:

Entering Trio from a foreign thread (in this case we do need to spawn a new task, like now)
cancellable=True + attempting to re-enter into a task that's gone

njsmith added the design discussion label Aug 16, 2018

njsmith mentioned this issue Aug 16, 2018

idea: unbound cancel scopes #607

Closed

This was referenced Sep 2, 2018

[discussion] Mechanism for "sharing" tasks between different branches of the task tree #266

Open

Some way to take a cancellation in one call to wait_task_rescheduled and forward it to another task #642

Open

njsmith mentioned this issue Dec 27, 2018

Better usability for threads #810

Open

njsmith mentioned this issue Jan 17, 2019

Delay deciding which cancel scope a Cancelled exception belongs to #860

Closed

oremanj mentioned this issue Feb 6, 2019

[RFC] Refactor cancellation for great justice #910

Closed

oremanj mentioned this issue Mar 1, 2019

Proposal: make checkpoint_if_cancelled() sync-colored and rename it accordingly #961

Open

oremanj mentioned this issue Mar 12, 2019

Maybe rework run_in_worker_thread API for cancellation handling? #60

Open

oremanj added cancellation threads labels May 4, 2019

njsmith mentioned this issue May 13, 2020

Rethink KeyboardInterrupt protection APIs #733

Open

richardsheridan mentioned this issue Aug 8, 2022

Expand cancellation usability from native trio threads #2392

Merged

richardsheridan closed this as completed in #2392 Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagating cancellation through threads #606

Propagating cancellation through threads #606

njsmith commented Aug 16, 2018 •

edited

Loading

njsmith commented Jul 27, 2019

Propagating cancellation through threads #606

Propagating cancellation through threads #606

Comments

njsmith commented Aug 16, 2018 • edited Loading

njsmith commented Jul 27, 2019

njsmith commented Aug 16, 2018 •

edited

Loading