Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce or eliminate quadratic behavior in cancellation system? #58

Closed
njsmith opened this issue Feb 24, 2017 · 0 comments · Fixed by #958
Closed

Reduce or eliminate quadratic behavior in cancellation system? #58

njsmith opened this issue Feb 24, 2017 · 0 comments · Fixed by #958

Comments

@njsmith
Copy link
Member

njsmith commented Feb 24, 2017

Currently, checking for cancellation is O(n), where n is the depth of the task's cancel stack. We do this check frequently, including every time a task yields and in yield_if_cancelled. This isn't so bad because generally cancel stacks are not terribly deep, but it is worrisome. (Note that the stack is always at least as deep as the supervision tree, so while a good practice is to only have one explicit timeout at the top of your code, that doesn't mean cancellation stacks are only 1 deep.)

This isn't really urgent until we start seeing cancellation-related operations showing up in profiles, but I suspect that eventually it might be worth reworking this code to improve the asymptotics. It's a bit tricky given the richness of our semantics (in particular shielding), but we can certainly do better than we are right now.

A simple change would be to have a per-task cache that says where the task is cancelled, and recompute it (a) any time one of the cancel scopes is cancelled, (b) any time one of the cancel states shielding state is mutated, (c) any time the cancel stack is popped. (I guess we can slightly optimize further by observing that shield = True cannot create a cancellation where there wasn't one before, and vice-versa for shield = False, and popping can never create a cancellation. But I suspect it's very rare for shields to be used for multiple tasks, or to be set/unset very often, so it doesn't much matter how fast that operation is -- the big win, if any, will be from reducing the overhead on yield. And popping is always restricted to just one task.)

@njsmith njsmith added the polish label Feb 24, 2017
oremanj added a commit to oremanj/trio that referenced this issue Feb 6, 2019
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others.

I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing.

An overview:
* New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy."
* Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child.
* Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC).
* `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s.
* Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task.
* `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition.

Potential advantages:
* Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point.
* Nursery.start() is much simpler.
* Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic).
* Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach.
* Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding.

Caveats:
* We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so.
* There are probably bugs, because aren't there always?

Current cancel logic:
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        incoming_deadline = inf if self._shield else parent_effective_deadline
        my_deadline = -inf if self._cancel_called else self._deadline
        return min(incoming_deadline, my_deadline), parent_extra_info
```
Want to support a grace period? I'm pretty sure it would work with something like
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline)
        if self._shield:
            parent_effective_deadline = parent_cleanup_deadline = inf
        my_cleanup_start = min(self._deadline, self._cancel_called_at)
        merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period)
        my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline)
        if self._shield_during_cleanup:
            effective_deadline = merged_cleanup_deadline
        else:
            effective_deadline = min(parent_effective_deadline, my_cleanup_start)
        return effective_deadline, my_extra_info
```
Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants