Problems with `trio.Lock()` across multiple tasks #3035

Zac-HD · 2024-07-16T22:29:41Z

Some time ago - never mind how long precisely - having a little coordination between tasks and nothing particular to interest me in semaphores, I saw our RuntimeError("can't release a Lock you don't own").

I approve pretty strongly of keeping things local by default; it prevents all sorts of bugs and trio.Semaphore is right there if you need something without this limitation. But two further observations prompted this issue:

trio.Semaphore() is considerably more powerful and flexible in general. By the principle of least power, it might be nice to have trio.Lock(*, allow_outside_release: bool = False) or something along those lines.
If the owning task finishes (returns or raises) without releasing the lock, then you're stuck forever. The toy program below demonstrates. Perhaps we should raise an error or at least emit a warning when a task which owns a lock finishes?

import trio

@trio.run
async def main():
    lock = trio.Lock()
    async with trio.open_nursery() as nursery:
        nursery.start_soon(lock.acquire)
    # The lock is now held, and can only be released by the acquiring task.
    # However, that task is now finished, so we can never release the lock!
    lock.release()

The text was updated successfully, but these errors were encountered:

A5rocks · 2024-08-02T21:04:01Z

I'm not sure how I feel about adding a keyword argument to trio.Lock. Does trio.Semaphore with a limit of 1 really have differences other than this? I like that trio.Lock is more easily debuggable and harder to shoot yourself with. (and if we make acquire error given the owner is dead, then even better!)

Perhaps we should raise an error or at least emit a warning when a task which owns a lock finishes?

I think that this could error when you run lock.acquire_nowait(). i.e. lock.acquire_nowait could check if self._owner is still running. This doesn't handle the following though:

task 1 calls lock.acquire
task 2 calls lock.acquire
task 1 exits

Zac-HD · 2024-08-05T00:30:24Z

Let's set Semaphore aside then, I'm more concerned that you can easily get a deadlock from Lock, e.g.

    lock = trio.Lock()
    async with trio.open_nursery() as nursery:
        nursery.start_soon(lock.acquire)
        nursery.start_soon(lock.acquire)

IMO instead of hanging forever, we should raise BrokenResourceError in the task(s) waiting on a lock which can never be released, and perhaps also when exiting the task which holds that lock.

A5rocks · 2024-08-05T00:53:23Z

I agree; BrokenResourceError (or RuntimeError if we want) is better than a deadlock. Though I think that if a lock gets acquired and never released, but nobody else wants it, it's OK. (Though it's definitely a sign of a bug? Should that warn?)

Zac-HD · 2024-08-05T17:35:12Z

I think so long as we get a BrokenResourceError instead of hanging, and can identify the offending lock + task if that happens, the other decisions are less crucial - e.g. I'd prefer to warn-or-error on task exit but only if that's not going to cost us performance in the happy case.

jakkdl · 2024-08-27T11:40:46Z

Erroring on

    lock = trio.Lock()
    async with trio.open_nursery() as nursery:
        nursery.start_soon(lock.acquire)
    await lock.acquire()

Is easy & straightforward, adding a check in acquire_nowait for self._owner not in self._owner._parent_nursery._children.

But the other cases are definitely more complicated, since the other task(s) might start waiting before the task owner exits. I think the way to go is to

Add logic to Runner.task_exited which checks if the exiting task is holding any lock
- which I think means we need to store held locks in the task.
notify all other tasks waiting on those locks that they should error
- Could make this a method in the public API on locks, error_other_holders
- also gotta look at how you cancel and inject an error into those tasks

I don't think this requires a big performance hit, but we'd likely need to add some overhead whenever acquiring/releasing a lock, which perhaps partially could be traded off with more logic in task_exited.

A5rocks · 2024-08-27T23:24:00Z

I think better than a parent nursery check is just checking if the task is running. But yeah I'm not sure how to handle the other cases. Should there be a way to "poison" a parking lot?

jakkdl · 2024-09-02T14:58:00Z

Another way of implementing it (inspired by @mikenerone) is having an Event firing on a Task exiting, and that event would wake up any other tasks waiting inside acquire. This initially sounded conceptually simple, but when I started thinking about how to actually implement it (acquire waiting for that Event firing or its ParkingLot.park(), and the lock event changing each time the lock owner changes) I'm not sure how clean it would be.

A5rocks · 2024-09-05T12:05:19Z

Here's my latest idea: we can have a global dictionary of task -> a list (not a set because need to allow multiple of the same element) of parking lots, w/ an interface like trio.lowlevel.add_parking_lot_breaker(task, parking lot) and trio.lowlevel.remove_parking_lot_breaker(task, parking lot).

Then lock uses that to have its parking lot broken if the task gets killed without releasing the lock. We can document this for use for anything that expects a task to release something but didn't.

jakkdl · 2024-09-05T14:16:43Z

Then lock uses that to have its parking lot broken if the task gets killed without releasing the lock

The lock initiates it?

Trying to flesh this out:

_LockImpl.acquire_now calls trio.lowlevel.add_parking_lot_breaker(self._owner, self._lot)
_LockImpl.release calls trio.lowlevel.remove_parking_lot_breaker(self._owner, self._lot)
Runner.task_exited calls trio.lowlevel.break_parking_lots(task)
1. This calls ParkingLot.break() on all associated lots
2. ParkingLot.break calls Runner.reschedule(..., next_send=Outcome.Error(BrokenResourceError(...))) for each parked entity (if that's how you inject errors into other Tasks).

A5rocks · 2024-09-05T14:44:21Z

Then lock uses that to have its parking lot broken if the task gets killed without releasing the lock

The lock initiates it?

Yeah, if I understand what you're asking.

Trying to flesh this out:

_LockImpl.acquire_now calls trio.lowlevel.add_parking_lot_breaker(self._owner, self._lot)

_LockImpl.release calls trio.lowlevel.remove_parking_lot_breaker(self._owner, self._lot)

Runner.task_exited calls trio.lowlevel.break_parking_lots(task)

We don't have to expose trio.lowlevel.break_parking_lots, it's fine if the runner just uses the global dict.

i. This calls ParkingLot.break() on all associated lots
ii. ParkingLot.break calls Runner.reschedule(..., next_send=Outcome.Error(BrokenResourceError(...))) for each parked entity (if that's how you inject errors into other Tasks).

ParkingLot.break also:

needs a different name because break is a keyword
should make any future ParkingLot.park calls error.

The error should probably contain a reference to the task that caused things to break. Maybe a subclass of BrokenResourceError.

CoolCat467 · 2024-09-05T21:03:49Z

How to handle future improvements when we can have multiple guest runs able to run at once?

A5rocks · 2024-09-05T21:18:28Z

The Task shouldn't compare the same, though if it does then ouch that's something we should fix. (Given there's no way to start a task on two loops). So it's fine to use it as a key.

jakkdl · 2024-09-06T10:41:00Z

we can have a global dictionary of task -> a list (not a set because need to allow multiple of the same element) of parking lots

why is it necessary to allow multiple of the same element? It's not required for lock, so I'm guessing you're thinking of some external scenario?

Opened #3081 with an initial implementation. It needs some polish but the overall approach looks solid

A5rocks · 2024-09-06T15:43:18Z

we can have a global dictionary of task -> a list (not a set because need to allow multiple of the same element) of parking lots

why is it necessary to allow multiple of the same element? It's not required for lock, so I'm guessing you're thinking of some external scenario?

Yeah I was thinking of a semaphore (cause it's still bad form to forget to release) and reentering within the same task.

jakkdl · 2024-09-09T12:49:06Z

we can have a global dictionary of task -> a list (not a set because need to allow multiple of the same element) of parking lots

why is it necessary to allow multiple of the same element? It's not required for lock, so I'm guessing you're thinking of some external scenario?

Yeah I was thinking of a semaphore (cause it's still bad form to forget to release) and reentering within the same task.

This made me look more into the other sync primitives, and CapacityLimiter might be a pretty good fit for an optional parameter that enables the same functionality (optional because of the _on_behalf_of methods). Semaphore would be a worse fit, but I suppose could also have it. (at that point we probably move the logic from _LockImpl to AsyncContextManagerMixin, or an intermediate class between those).

This was referenced Sep 2, 2024

Make it easier to see if a Task is created/running/exited #3079

Open

raise BrokenResourceError if lock owner has exited. #3080

Open

jakkdl linked a pull request Sep 6, 2024 that will close this issue

Add ability to break parking lots, stop locks from stalling #3081

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with `trio.Lock()` across multiple tasks #3035

Problems with `trio.Lock()` across multiple tasks #3035

Zac-HD commented Jul 16, 2024

A5rocks commented Aug 2, 2024

Zac-HD commented Aug 5, 2024

A5rocks commented Aug 5, 2024

Zac-HD commented Aug 5, 2024

jakkdl commented Aug 27, 2024 •

edited

Loading

A5rocks commented Aug 27, 2024

jakkdl commented Sep 2, 2024

A5rocks commented Sep 5, 2024

jakkdl commented Sep 5, 2024

A5rocks commented Sep 5, 2024 •

edited by jakkdl

Loading

CoolCat467 commented Sep 5, 2024

A5rocks commented Sep 5, 2024

jakkdl commented Sep 6, 2024

A5rocks commented Sep 6, 2024

jakkdl commented Sep 9, 2024

Problems with trio.Lock() across multiple tasks #3035

Problems with trio.Lock() across multiple tasks #3035

Comments

Zac-HD commented Jul 16, 2024

A5rocks commented Aug 2, 2024

Zac-HD commented Aug 5, 2024

A5rocks commented Aug 5, 2024

Zac-HD commented Aug 5, 2024

jakkdl commented Aug 27, 2024 • edited Loading

A5rocks commented Aug 27, 2024

jakkdl commented Sep 2, 2024

A5rocks commented Sep 5, 2024

jakkdl commented Sep 5, 2024

A5rocks commented Sep 5, 2024 • edited by jakkdl Loading

CoolCat467 commented Sep 5, 2024

A5rocks commented Sep 5, 2024

jakkdl commented Sep 6, 2024

A5rocks commented Sep 6, 2024

jakkdl commented Sep 9, 2024

Problems with `trio.Lock()` across multiple tasks #3035

Problems with `trio.Lock()` across multiple tasks #3035

jakkdl commented Aug 27, 2024 •

edited

Loading

A5rocks commented Sep 5, 2024 •

edited by jakkdl

Loading