Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagating cancellation through threads #606

Closed
njsmith opened this issue Aug 16, 2018 · 1 comment · Fixed by #2392
Closed

Propagating cancellation through threads #606

njsmith opened this issue Aug 16, 2018 · 1 comment · Fixed by #2392

Comments

@njsmith
Copy link
Member

njsmith commented Aug 16, 2018

While writing up #607, I was thinking about a neat feature that curio has. I'm not sure if it's very important, but I envy it a bit, so I might as well open an issue to dump thoughts out where we can see them.

The feature is: say you use run_sync_in_worker_thread to enter a thread, and then from the thread you use BlockingTrioPortal.run to re-enter trio. So conceptually, this is all one stack, that starts in trio, goes into a thread, and then goes back into trio. Now suppose the original run_sync_in_worker_thread is cancelled. It would be neat if this caused the code inside the BlockingTrioPortal.run call to raise a Cancelled error that propagated all the way back out of trio, through the thread, and back into trio.

See #607 for more details about why this is tricky, and hard to fix even by extending the current cancel API.

I guess either we need some way to reenter a given cancel binding (which feels very weird and unlikely to have other use cases), or else some way to capture the exception in run_sync_in_worker_thread (<- this part we have) and then inject it into the eventual task (<- this part we don't have). Well... unless we do it by making a new cancel scope inside the eventual task, cancel it, and then re-raise after the inner cancellation unwinds. (We could even fudge the traceback if we want to be really tricky.) Or... we could switch from attaching Cancelled objects to specific scopes/bindings, and make it so that each with block checks if it is associated with the outermost cancelled scope, and if so it catches all Cancelled exceptions, and then it would be fine to manually capture a Cancelled from inside the re-entry task and pass it over to the thread, before it gets caught. Except, ugh, that won't work if the exception isn't Cancelled but rather KeyboardInterrupt.

It wouldn't be terrible to manually cancel the re-entry task, then call the raise_cancel function to get the real exception, and attach the internal exception to it as a __context__ (though this would require #285).

Alternatively... can we avoid all this rigmorale? What if run_sync_in_worker_thread created a nursery, and when BTP.run re-entered it put tasks into that nursery, instead of the system nursery? Then any cancel scope that contained the run_sync_in_worker_thread call would also automatically contain any re-entry tasks.

KeyboardInterrupt might still require some careful handling, since it doesn't follow the normal cancel scope scoping rules. I guess we could do some shenanigans like pushing the actual wait-for-thread into a child task? Actually this would be pretty simple: open nursery, start a child to do the actual thread waiting stuff, then park in the nursery __aexit__KeyboardInterrupt will be delivered to __aexit__ and cause the nursery to be cancelled. (Actually, this is an interesting possible strategy for delivering KeyboardInterrupt in general: pick a nursery and inject it directly there, as if it were raised by a task inside that nursery. Or do this to the system nursery, though that feels weird.) I'd be a little nervous about adding overhead to run_sync_in_worker_thread, but maybe it wouldn't be that bad.

oremanj added a commit to oremanj/trio that referenced this issue Feb 6, 2019
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others.

I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing.

An overview:
* New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy."
* Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child.
* Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC).
* `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s.
* Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task.
* `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition.

Potential advantages:
* Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point.
* Nursery.start() is much simpler.
* Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic).
* Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach.
* Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding.

Caveats:
* We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so.
* There are probably bugs, because aren't there always?

Current cancel logic:
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        incoming_deadline = inf if self._shield else parent_effective_deadline
        my_deadline = -inf if self._cancel_called else self._deadline
        return min(incoming_deadline, my_deadline), parent_extra_info
```
Want to support a grace period? I'm pretty sure it would work with something like
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline)
        if self._shield:
            parent_effective_deadline = parent_cleanup_deadline = inf
        my_cleanup_start = min(self._deadline, self._cancel_called_at)
        merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period)
        my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline)
        if self._shield_during_cleanup:
            effective_deadline = merged_cleanup_deadline
        else:
            effective_deadline = min(parent_effective_deadline, my_cleanup_start)
        return effective_deadline, my_extra_info
```
Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
@njsmith
Copy link
Member Author

njsmith commented Jul 27, 2019

In retrospect, I think I was making this too complicated. The case where we want to propagate cancellation is when we're inside a trio.to_thread.run_sync worker thread. And in that case, we already have a task on the Trio side: the one that called to_thread.run_sync. It already has the right cancellation context. And it's just sitting around doing nothing, waiting for the thread. So we might as well give it something to do.

Basically, after spawning the thread, it would sit in a loop like:

while True:
    msg_from_thread = await wait_task_rescheduled(...)
    if msg_from_thread.type is THREAD_DONE:
        return msg_from_thread.result.unwrap()
    elif msg_from_thread.type is RUN:
        result = await outcome.acapture(msg_from_thread.async_fn)
        to_thread_queue.put(result)
    ...

We'd stash the communications channels in thread-local variables, like we do the trio token, and then the thread would send messages back to report when it's finished, or when it has some work it needs to do in trio context.

Interesting special cases that we'll want to make sure we handle correctly:

  • Entering Trio from a foreign thread (in this case we do need to spawn a new task, like now)
  • cancellable=True + attempting to re-enter into a task that's gone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants