Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind quirk #329

Closed
aantron opened this issue Apr 2, 2017 · 19 comments
Closed

Bind quirk #329

aantron opened this issue Apr 2, 2017 · 19 comments
Labels
Milestone

Comments

@aantron
Copy link
Collaborator

aantron commented Apr 2, 2017

In Lwt.bind p f, if p is already resolved and f raises, the bind raises. This is not consistent with the behavior of Lwt.map f p in the same circumstances: map pushes the exception into the new promise it evaluates to.

It is also not consistent with the behavior of either bind or map if p is not yet resolved. However, that inconsistency is a bit more defensible, because bind and map then can't run f before returning. But still, at least map has the same behavior whether or not p is resolved when map is called.

open Lwt.Infix

let () =
  let resolved = Lwt.return () in
  let pending, resolve = Lwt.wait () in

  (* This first bind (p1) raises Exit, while the other binds and maps fail the
     resulting promises (p2, p3, p4) with Exit instead. *)
  let p1 = resolved >>= (fun () -> raise Exit) in
  let p2 = resolved >|= (fun () -> raise Exit) in

  let p3 = pending >>= (fun () -> raise Exit) in
  let p4 = pending >|= (fun () -> raise Exit) in

  Lwt.wakeup resolve ();

  assert (Lwt.state p2 = Lwt.Fail Exit);
  assert (Lwt.state p3 = Lwt.Fail Exit);
  assert (Lwt.state p4 = Lwt.Fail Exit)

(* ocamlfind opt -linkpkg -package lwt bind.ml && ./a.out *)

In particular, as a consequence of the first paragraph, one cannot define

let map' f p = Lwt.bind p (fun v -> Lwt.return (f v))

and get the same semantics as Lwt.map, because map' will raise "eagerly" like Lwt.bind does.

I think this is a bug. Thoughts?

@hcarty
Copy link
Contributor

hcarty commented Apr 2, 2017

I agree that this seems like a bug. Good find!

@mfp
Copy link
Collaborator

mfp commented Apr 2, 2017 via email

@aantron
Copy link
Collaborator Author

aantron commented Apr 2, 2017

Yes, I realize that this makes the call to f a tail call, but it's not clear to me that that is very important, in particular that its importance in bind is so much greater than in map, that it's worth having different semantics between the two.

I've also been coding with awareness of how bind works, but I hadn't realized that map is different.

The function given to Lwt.map does not create new promises, so there's no problem (and it is indeed safer) to wrap the exception in the Lwt monad.

It's not clear to me that this is an advantage of map that makes it safer to not do a tail call. It seems more like a disadvantage: since map has to wrap the result of f in a promise, a tail call just happens not to be possible in map anyway.

EDIT: Ok, that paragraph is not directly replying to the quote, more like replying to my opinion of the difference between the two functions, and why I think at least one of them is being too clever.

resolved promise (is that the correct terminology?)

Yes :)

@aantron
Copy link
Collaborator Author

aantron commented Apr 2, 2017

Ah @mfp, I inserted a small edit into my comment above; I forgot that you might not see it if you reply only by email. Sorry, I won't edit like that again in the future.

@mfp
Copy link
Collaborator

mfp commented Apr 2, 2017 via email

@aantron
Copy link
Collaborator Author

aantron commented Apr 2, 2017

No, those are valid points. It helps to have them clearly stated :) I also originally expected bind to capture "immediate" exceptions, when I was first learning Lwt.

IMO, it is a sign of a serious design flaw that that we have to choose between:

  1. sneakily blowing up the stack, and
  2. offering a misleading and confusing interface w.r.t "immediate" exceptions.

The smallest reasonable change is probably to make map like bind, but that seems like making map worse just because we don't have bind right. So definitely I agree on that.

I haven't fully thought this through, but it seems to me that a way to resolve this is not to guarantee that bind will run f right away, i.e. do something like what wakeup_later does. wakeup_later resolver checks if Lwt is already "wakening" (in current terminology of lwt.ml): a phase in which Lwt sets promise states to returned or failed, and calls the functions waiting on those promises. If not already wakening, wakeup_later resolves the promise associated to resolver right away, otherwise it queues the resolution to happen later, when the current wakening phase ends. This "wakening" loop is essentially the top level of Lwt.

When you use wakeup_later, the promise is resolved at one of two stack depths:

  • If already wakening, immediately below the stack frame of the current wakening loop, which is above the current call to wakeup_later.
  • If not already wakening, wakeup_later pushes a stack frame for a new wakening loop, and resolves the promise there.

Basically, the current behavior of bind p f is to run the "waiting" function f right away, just because p happens to be already resolved, without minding the wakening loop at all. Then, we are forced to rely on tail recursion to cover up for the fact that almost none of Lwt can be safely re-entered in this way.

This is somehow notionally equivalent to running wakeup in a loop, which also doesn't care about whether we are already in a wakening phase or not, and is not tail-recursive. If you wakeup a promise, which then wakeups another promise, and so on, you will have stack overflow. I've already de-emphasized wakeup in the new draft docs for this reason. IMO, wakeup_later should be the primary function for manually resolving a promise, and perhaps bind needs to have the same semantics.

Essentially, tail recursion is not available in most of Lwt, including in map and wakeup, and while it seems possible in bind, it actually turns out not to be possible once you consider the present issue.

Sorry if I wrote too much :)

@aantron
Copy link
Collaborator Author

aantron commented Apr 3, 2017

I guess this is an issue in catch, try_bind, and finalize as well. For example:

open Lwt.Infix

let () =
  let failed = Lwt.fail_with "foo" in
  let pending, resolve = Lwt.wait () in

  (* This catch leaks an exception from the handler, while the one for [p2]
     pushes the exception into [p2]. *)
  let p1 =
    Lwt.catch
      (fun () -> failed)
      (fun exn -> raise Exit)
  in

  let p2 =
    Lwt.catch
      (fun () -> pending)
      (fun exn -> raise Exit)
  in
  Lwt.wakeup_exn resolve (Failure "foo");

  assert (Lwt.state p2 = Lwt.Fail Exit);

(* ocamlfind opt -linkpkg -package lwt catch.ml && ./a.out *)

@aantron
Copy link
Collaborator Author

aantron commented Apr 3, 2017

Another example of tail calls not happening in Lwt when they could be naively expected, this time due to wrapping function calls in async_exception_hook: #206 (comment)

@aantron aantron modified the milestone: 4.0.0 Apr 13, 2017
@mfp
Copy link
Collaborator

mfp commented Apr 20, 2017

This is probably something that Async got right (it doesn't "bind eagerly").

A related (and very nice) property is that it guarantees context switches can only happen in bind, which in Lwt's case would be akin to wakeup and wakeup_later being fused into a single function with the semantics the latter has when already wakening (similar changes required in wakeup_exn, cancel, and others).

Maybe the "bind quirk" wrt. to map can be resolved (hah!) the other way...

I think that both ensuring that OCaml-level exceptions are captured properly in the RHS of a bind (and the other functions you listed) and guaranteeing the context switch property by changing the wakeup semantics are valuable goals. Two conses come to mind:

  1. it might break existing code that relies on the current eager evaluation semantics
  2. it imposes some performance cost

Any others?

(2) seems a priori trivial for practical uses cases, but (1) is more troubling. I used to think it was way too late for such a change, but after seeing how the deprecation scheme worked for 3.0, maybe, and if the change is deemed worthy, something similar could be done to e.g. introduce a new Lwt.resolve function with the new semantics and deprecate wakeup_* so that the switch is completed by 4.0 or 5.0? (This needs much further consideration of course, I'm just pointing it out as a possibility).

@aantron
Copy link
Collaborator Author

aantron commented Apr 20, 2017

Yes, this is pretty much exactly what I want to do to all those functions.

it might break existing code that relies on the current eager evaluation semantics

For this, I thought to do some kind of study (exhaustive or probabilistic) of at least the code in OPAM. My hunch is that it will break very little actual code, though code in test cases or other contexts that don't depend on Lwt_main.run in some projects might be more susceptible.

Also, I don't think we would be breaking the documented API, though breaking its undocumented behavior is still dangerous. Unfortunately, it's some kind of well-known fact in the community that f will run immediately if p is ready. I hope most code authors assumed that they can't know when f will run, because that's the general case, and so is easier to think about.

it imposes some performance cost

I would guess that the performance cost would be from extra allocations of deferred resolution queue nodes, and from caching issues due to resolving things in a different order than now.

But, I thought about this in the last couple weeks, and we have at least one lever to mitigate this – we can make Lwt "finitely reentrant," where, let's say, up to 42 resolutions of a promise can be run without deferring. The extra cost would be maintaining this counter, which should be extremely cheap, and setting up an exception handler around each immediate call, which should also be cheap. After the 42nd resolution down into the current stack, the next resolution goes in the queue, i.e. it is treated like binding on something that isn't yet ready.

In what I originally proposed above, 42 is instead 0 (no immediate resolutions allowed). The current semantics allow an unbounded number of immediate resolutions, which forces us to use tail recursion in bind.

I am pretty sure we can make this change using a combination of study, soft breakage (3.0.0 style), and performance tweaks, the latter of which might not even be necessary. The payoff would be much more predictable and teachable semantics.

The above applies to bind, catch, etc. Lwt.wakeup is a different beast, because it is explicitly supposed to run the resolution now, though in a weird way: because the Lwt.wakeup_later docs say that wakeup_later is not guaranteed to run callbacks immediately. I suspect that the real reason for most usage of Lwt.wakeup is just that the name is shorter than Lwt.wakeup_later, but there is likely code that actually depends on Lwt.wakeup running immediately. I have also written such code myself.

But, since we already have these functions, we can just deprecate Lwt.wakeup, leave its functionality alone and warn people about it, and encourage usage of Lwt.wakeup_later. We can give wakeup_later a new name, but I don't think it's necessary. Maybe once the promise (or whatever) terminology is truly settled, we can think about which functions to strategically rename (without removing the old names, of course), as a separate issue.

@mfp
Copy link
Collaborator

mfp commented Apr 20, 2017

But, I thought about this in the last couple weeks, and we have at least one lever to mitigate this – we can make Lwt "finitely reentrant," where, let's say, up to 42 resolutions of a promise can be run without deferring.

That's clever. It's nice to be able to make the presumed overhead arbitrarily small (modulo the costs you mentioned).

I suspect that the real reason for most usage of Lwt.wakeup is just that the name is shorter than Lwt.wakeup_later, but there is likely code that actually depends on Lwt.wakeup running immediately. I have also written such code myself.

Also the fact that Lwt.wakeup has existed AFAIK since Lwt's early days (in unison's tree), and Lwt.wakeup_later was introduced much later in 2011.

But, since we already have these functions, we can just deprecate Lwt.wakeup, leave its functionality alone and warn people about it, and encourage usage of Lwt.wakeup_later. We can give wakeup_later a new name, but I don't think it's necessary.

Yes on second thought just promoting Lwt.wakeup_latershould do. There's a minuscule chance somebody out there could be relying on Lwt.wakeup_later's precise (but undocumented!) semantics and it behaving like Lwt.wakeup when not in wakeup phase, but it seems most unlikely: it cannot be in library code anyway, where there's no telling whether the code will be invoked in a 'wakeup' (and going out of one's way to detect whether it's the case would be borderline criminal behavior :)

@aantron
Copy link
Collaborator Author

aantron commented Apr 20, 2017

Also the fact that Lwt.wakeup has existed AFAIK since Lwt's early days (in unison's tree), and Lwt.wakeup_later was introduced much later in 2011.

That too :) Hadn't seen that.

behaving like Lwt.wakeup when not in wakeup phase

Actually, now I think it will still have to do that. If this is the top-level call to Lwt.wakeup_later, I think the promise's callbacks must be called immediately, because once Lwt.wakeup_later returns, Lwt loses control, potentially forever (sounds a bit over-dramatic...).

So it will be something like: run callbacks now, unless already running callbacks. I guess it will be that for bind too, so the max number of nested immediate wakeups in the original proposal would be 1, not 0. Of course, if we have to resort to the tweak, it will be like: run callbacks now, unless this is the 43rd nested attempt to run callbacks. Which sounds ridiculous :p

@aantron
Copy link
Collaborator Author

aantron commented Apr 20, 2017

To be more precise above, s/top-level call to Lwt.wakeup_later/top-level entry into Lwt/, where entry means running any callback of a promise (including by bind and elsewhere), and therefore running the Lwt book-keeping associated with that.

@aantron
Copy link
Collaborator Author

aantron commented Apr 20, 2017

On the matter of how to make such a change relatively gently, since we want to reuse the infix names and the PPX syntax (i.e. change their semantics), we can't really deprecate the existing names and ask everyone to patch their code repeatedly.

I think, if the revdeps study shows that very few users will be affected, then we can go ahead with this. We will make some minor (n.N.0) release where we announce that this change is coming up. The announcement goes in the changelog, the mailing list, messages in the opam file, and anywhere else we can think of.

Simultaneously, we can make a branch of Lwt available with the new semantics already applied, and provide clear instructions on how to pin that branch, and a clear, justified explanation of what the change is and why we are doing it – a summary of this discussion.

With the pin, a user can exercise their code base (run test suite, etc.) against the new semantics by running their normal build process. If exercising the code seems unproblematic (hopefully the typical outcome), then the user doesn't have to do anything except unpin Lwt. Otherwise, the user can constrain Lwt to < N.0.0, adjust their code so that it works with both current and new semantics, or come to the repo and tell us why this course of action is idiotic. And, in the latter case, if the user is right, we will have time to agree and change our minds :)

@mfp
Copy link
Collaborator

mfp commented May 13, 2017

What would be needed for the 3.1.0 milestone wrt. this?

Making the branch with the new semantics available? Not sure how it ties to 3.1.0 effectively; it could be made available at any time (with enough advance) before (hypothetically) 4.0. Or to put it in another way, it doesn't seem to me 3.1.0 would block on this.

What we can and probably should do already in 3.1.0 is try to encourage the transition from wakeup to wakeup_later. We can mention this in the ANN and the mli, but ideally we'd want an attribute with the same semantics as ocaml.deprecated and with an arbitrary message of ours. ocaml.ppwarning doesn't do since it would only fire at definition (not usage) site AFAICS: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec247

@aantron
Copy link
Collaborator Author

aantron commented May 13, 2017

I agree that 3.1.0 doesn't block on this. To answer a hypothetical broader query, any issue can be dropped from a milestone if we don't have enough time, etc. – I just assigned this issue to 3.1.0 as a reminder to give the issue a hard look, in case we want to do 4.0.0 three months after 3.1.0, without an intervening minor release, and want to commit this change in 4.0.0. That might not be realistic anyway, because the revdeps study is going to take a while.

But, to answer the actual specific concern that I believe you have: in 3.1.0 (or whatever minor release we have enough time to prepare this for), we should probably

  • add an announcement to opam, so that anyone installing or upgrading to that minor release will get a warning and link to the branch.
  • That requires the branch be made available before the minor release is published to OPAM.

So basically, it's because that minor release's announcements are maybe the best time to announce the change and the branch, and the opam file is one case where I think it is the only reasonable time.

I agree that we should probably discourage using wakeup, independently of this whole process with the semantics. I actually think deprecating it is fine. @@ocaml.deprecated seems right to me, and we probably need to write something in an issue or on the wiki explaining why "wakeup considered harmful," and link to that from the deprecation message. Are you concerned that deprecated doesn't do the right thing?

@mfp
Copy link
Collaborator

mfp commented May 13, 2017

I seemed to remember you'd rejected the idea of deprecating it before, but it was actually the renaming of wakeup_later to resolve you deemed unnecessary. (The reason I put that possibility forward is that a rename would raise attention towards the new semantics and its rationale, the name would be shorter and consistent with the new terminology, but it's not clear whether it is worth the effort).

@aantron
Copy link
Collaborator Author

aantron commented May 13, 2017

Well, I think it's unnecessary as part of this specific issue, but I would like resolve at some point, in general :) I'm just worried about the terminology. In the lwt.ml refactoring, I ended up using "resolve" to correspond to Result.Ok, "fail" to correspond to Result.Error, and "complete" to correspond to either one of these two actions. I don't like "complete," and it seems like "resolve" would be promoted to the meaning of complete once there is only one way to resolve a promise, in a theoretical Lwt that does not conflate exception handling with concurrency. But I don't want to get into all these issues now, and I'm scared to use nice names like "resolve" that we want to save for later, now, before the semantics are settled, because we might have to throw them away upon doing more work :/

Basically, I want to reserve nice names like "resolve" for potential even larger semantic changes.

@aantron
Copy link
Collaborator Author

aantron commented Mar 24, 2018

Closing this for now as Lwt won't resolve it in the immediate future. See https://discuss.ocaml.org/t/1337/13 and #453 (comment).

@aantron aantron closed this as completed Mar 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants