Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock detection and resolution? #26

Closed
alvestrand opened this issue Jan 8, 2018 · 6 comments
Closed

Deadlock detection and resolution? #26

alvestrand opened this issue Jan 8, 2018 · 6 comments

Comments

@alvestrand
Copy link

If two processes execute

await navigator.locks.acquire('A', async lock => {
await navigator.locks.acquire('B')
})

and

await navigator.locks.aquire('B', async lock => {
await navigator.locks.acquire('A')
})

standard processing theory says that they will sooner or later end up in a sad state (deadlock).

  1. can we detect this?
  2. can we resolve this?

On the detection side, it worries me slightly that the lock snapshot state doesn't identify holders in any way; there's no way to differentiate between "X holds A and Y holds B" and "X holds A and B, Y waits for A and B".

@inexorabletash
Copy link
Member

The naive answer is of course: that's a bug in the web app, which should be coded to avoid deadlocks, e.g. by always requesting multiple locks in a strict order. (See #20). But yeah, what further assistance can we provide...

As noted, even a web app that is defending against its own bugs can't tell by state inspection what is holding the locks/making the requests; logging the query() snapshot to the server may not provide enough to debug. Ideas that come to mind: (1) allowing apps to specify an arbitrary identifier with each request which is reflected in the snapshot, (2) including a the user-agent assigned id in the snapshot. I don't think we expose the latter outside of SWs at the moment, so it wouldn't correlate with much, although it would at least show that A and B are held by/requested by different contexts, thus clearly showing the deadlock.

@alvestrand
Copy link
Author

I think it's inevitable that locks will be used inside libraries.

A side effect of "a lock is a name" is that the system has no idea what locks can exist - so some library can always use an internal lock that would fit in lexical order in a surprising way - so there's no way for the browser to predict what locks are going to exist, which defeats most schemes for avoiding deadlock.

What I'd like to see is:
a) a part of the spec that talks about the danger of deadlocks, and that acquisition in some hierarchical order and not holding them longer than required are both Good Practices - won't prevent them, but will at least warn the users that they should google the word "deadlock" when their app stops suddenly and mysteriously :-)
b) a part of the spec that allows diagnostics. Including a context ID in the lock listing (I'd go for an internally generated one - don't burden the simple use cases with managing an ID space, and don't let the user shoot themselves in the foot by having two contexts with the same name) seems like a good start.

@inexorabletash
Copy link
Member

re: spec/deadlocks - absolutely, everything you listed is awesome. I'm signing you up to review it when written. :)

re: diagnostics - In theory, including the context's environment id would be straightforward (one more thing plumbed through from request). @domenic - is this a reasonable use of that data?

(In Blink's implementation, it looks like we currently mint the client ID as a UUID in the "service worker host" i.e. a SW-specific bit of code associated with each client (a.k.a. agent), rather than actually having something that multiple services could easily share. I.e. some refactoring would be needed, but that's not a spec problem.)

@domenic
Copy link

domenic commented Jan 8, 2018

I don't know much about environment ID; I think it was added for service workers. @jungkees or maybe @annevk would be able to help understand that better. Although I guess you figured most of it out by digging through the Blink code :)

inexorabletash added a commit that referenced this issue Jan 8, 2018
@inexorabletash
Copy link
Member

Took a stab at adding clientId to the snapshot in the proto-spec.

I have an implementation/tests in blink (up for review shortly) that adds the property, although it mints the ids on its own (like serviceworker).

@inexorabletash
Copy link
Member

Closing this for now - I added a section to the explainer (README) about deadlocks; everything in the explainer should end up in the eventual spec so that'll be the seed.

Thanks @alvestrand !

MXEBot pushed a commit to mirror/chromium that referenced this issue Jan 12, 2018
The proposed API includes a query() method which returns a snapshot of
the state of the lock manager for an origin - which locks are held,
and which locks are requested. This can be used by developers to debug
their web applications, either locally or in the field.

This CL introduces a clientId field giving a unique identifier for
each execution context, giving more context about the system. For
example, this can be used to determine that a deadlock has been
introduced.

NOTE: The identifier should be the "environment's id" in spec language,
and thus the same clientId exposed in Service Workers for client
enumeration and fetch events, but currently these id namespaces are
not unified.

Proposal: https://github.com/inexorabletash/web-locks/
Discussion: w3c/web-locks#26

Bug: 161072
Change-Id: I56df39acd7ca12d6c0062c3efcd21d97c5992770
Reviewed-on: https://chromium-review.googlesource.com/855576
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Victor Costan <pwnall@chromium.org>
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528797}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants