Bind quirk #329

aantron · 2017-04-02T16:16:16Z

In Lwt.bind p f, if p is already resolved and f raises, the bind raises. This is not consistent with the behavior of Lwt.map f p in the same circumstances: map pushes the exception into the new promise it evaluates to.

It is also not consistent with the behavior of either bind or map if p is not yet resolved. However, that inconsistency is a bit more defensible, because bind and map then can't run f before returning. But still, at least map has the same behavior whether or not p is resolved when map is called.

open Lwt.Infix

let () =
  let resolved = Lwt.return () in
  let pending, resolve = Lwt.wait () in

  (* This first bind (p1) raises Exit, while the other binds and maps fail the
     resulting promises (p2, p3, p4) with Exit instead. *)
  let p1 = resolved >>= (fun () -> raise Exit) in
  let p2 = resolved >|= (fun () -> raise Exit) in

  let p3 = pending >>= (fun () -> raise Exit) in
  let p4 = pending >|= (fun () -> raise Exit) in

  Lwt.wakeup resolve ();

  assert (Lwt.state p2 = Lwt.Fail Exit);
  assert (Lwt.state p3 = Lwt.Fail Exit);
  assert (Lwt.state p4 = Lwt.Fail Exit)

(* ocamlfind opt -linkpkg -package lwt bind.ml && ./a.out *)

In particular, as a consequence of the first paragraph, one cannot define

let map' f p = Lwt.bind p (fun v -> Lwt.return (f v))

and get the same semantics as Lwt.map, because map' will raise "eagerly" like Lwt.bind does.

I think this is a bug. Thoughts?

The text was updated successfully, but these errors were encountered:

hcarty · 2017-04-02T16:35:06Z

I agree that this seems like a bug. Good find!

mfp · 2017-04-02T17:02:44Z

On Sun, Apr 02, 2017 at 09:16:16AM -0700, Anton Bachin wrote: In `Lwt.bind p f`, if `p` is already resolved and `f` raises, the `bind` raises. This is not consistent with the behavior of `Lwt.map f p` in the same circumstances: `map` pushes the exception into the new promise it evaluates to. It is also not consistent with the behavior of either `bind` or `map` if `p` is not yet resolved. However, that inconsistency is a bit more defensible, because `bind` and `map` then can't run `f` before returning. But still, at least `map` has the same behavior whether or not `p` is resolved when `map` is called.

[...]

I think this is a bug. Thoughts?

This seems deliberate to me -- this is largely why Lwt.wrap exists -- and something I keep in mind when I code. This semantics of Lwt.bind is required to support tail calls properly -- otherwise, each bind on a resolved promise (is that the correct terminology?) would push a stack frame for the corresponding try ... with. The function given to Lwt.map does not create new promises, so there's no problem (and it is indeed safer) to wrap the exception in the Lwt monad.

…

-- Mauricio Fernández

aantron · 2017-04-02T17:38:29Z

Yes, I realize that this makes the call to f a tail call, but it's not clear to me that that is very important, in particular that its importance in bind is so much greater than in map, that it's worth having different semantics between the two.

I've also been coding with awareness of how bind works, but I hadn't realized that map is different.

The function given to Lwt.map does not create new promises, so there's no problem (and it is indeed safer) to wrap the exception in the Lwt monad.

It's not clear to me that this is an advantage of map that makes it safer to not do a tail call. It seems more like a disadvantage: since map has to wrap the result of f in a promise, a tail call just happens not to be possible in map anyway.

EDIT: Ok, that paragraph is not directly replying to the quote, more like replying to my opinion of the difference between the two functions, and why I think at least one of them is being too clever.

resolved promise (is that the correct terminology?)

Yes :)

aantron · 2017-04-02T17:42:54Z

Ah @mfp, I inserted a small edit into my comment above; I forgot that you might not see it if you reply only by email. Sorry, I won't edit like that again in the future.

mfp · 2017-04-02T18:31:03Z

On Sun, Apr 02, 2017 at 10:38:29AM -0700, Anton Bachin wrote: Yes, I realize that this makes the call to `f` a tail call, but it's not clear to me that that is very important, in particular that its importance in `bind` is so much greater than in `map`, that it's worth having different semantics between the two.

The difference is that a Lwt program is a really really long chain of Lwt.bind calls evaluated by Lwt_main.run, and Lwt.map uses just an "immediate" function that is expected to be well-behaved regarding stack usage :-) Consider for instance `Lwt_list.iter_s f l` where f evaluates immediately and l is a long-ish list.

> The function given to Lwt.map does not create new promises, so there's no problem (and it is indeed safer) to wrap the exception in the Lwt monad. It's not clear to me that this is an *advantage* of `map` that makes it safer to not do a tail call. It seems more like a *disadvantage*: since `map` has to wrap the result of `f` in a promise, a tail call just happens not to be possible in `map` anyway.

My take is this: ideally, we'd want bind to capture "immediate" exceptions in the right-hand function (and when I started using Lwt that was the semantics I expected), but alas we can't have it in practice because we cannot rule out the possibility of a long chain of binds being non-blocking. For the sake of consistency, we could have `map` with the same semantics as `bind` (Edit: with no gain regarding stack usage, because we have to wrap the result anyway, as you said), but its current semantics is the one we'd want Lwt.bind to have if it weren't impossible for practical reasons that don't hold for Lwt.map. PS: my apologies if what I'm saying is obvious, I'm not too lucid today :) -- Mauricio Fernández

aantron · 2017-04-02T21:05:45Z

No, those are valid points. It helps to have them clearly stated :) I also originally expected bind to capture "immediate" exceptions, when I was first learning Lwt.

IMO, it is a sign of a serious design flaw that that we have to choose between:

sneakily blowing up the stack, and
offering a misleading and confusing interface w.r.t "immediate" exceptions.

The smallest reasonable change is probably to make map like bind, but that seems like making map worse just because we don't have bind right. So definitely I agree on that.

I haven't fully thought this through, but it seems to me that a way to resolve this is not to guarantee that bind will run f right away, i.e. do something like what wakeup_later does. wakeup_later resolver checks if Lwt is already "wakening" (in current terminology of lwt.ml): a phase in which Lwt sets promise states to returned or failed, and calls the functions waiting on those promises. If not already wakening, wakeup_later resolves the promise associated to resolver right away, otherwise it queues the resolution to happen later, when the current wakening phase ends. This "wakening" loop is essentially the top level of Lwt.

When you use wakeup_later, the promise is resolved at one of two stack depths:

If already wakening, immediately below the stack frame of the current wakening loop, which is above the current call to wakeup_later.
If not already wakening, wakeup_later pushes a stack frame for a new wakening loop, and resolves the promise there.

Basically, the current behavior of bind p f is to run the "waiting" function f right away, just because p happens to be already resolved, without minding the wakening loop at all. Then, we are forced to rely on tail recursion to cover up for the fact that almost none of Lwt can be safely re-entered in this way.

This is somehow notionally equivalent to running wakeup in a loop, which also doesn't care about whether we are already in a wakening phase or not, and is not tail-recursive. If you wakeup a promise, which then wakeups another promise, and so on, you will have stack overflow. I've already de-emphasized wakeup in the new draft docs for this reason. IMO, wakeup_later should be the primary function for manually resolving a promise, and perhaps bind needs to have the same semantics.

Essentially, tail recursion is not available in most of Lwt, including in map and wakeup, and while it seems possible in bind, it actually turns out not to be possible once you consider the present issue.

Sorry if I wrote too much :)

aantron · 2017-04-03T05:36:12Z

I guess this is an issue in catch, try_bind, and finalize as well. For example:

open Lwt.Infix

let () =
  let failed = Lwt.fail_with "foo" in
  let pending, resolve = Lwt.wait () in

  (* This catch leaks an exception from the handler, while the one for [p2]
     pushes the exception into [p2]. *)
  let p1 =
    Lwt.catch
      (fun () -> failed)
      (fun exn -> raise Exit)
  in

  let p2 =
    Lwt.catch
      (fun () -> pending)
      (fun exn -> raise Exit)
  in
  Lwt.wakeup_exn resolve (Failure "foo");

  assert (Lwt.state p2 = Lwt.Fail Exit);

(* ocamlfind opt -linkpkg -package lwt catch.ml && ./a.out *)

aantron · 2017-04-03T16:46:59Z

Another example of tail calls not happening in Lwt when they could be naively expected, this time due to wrapping function calls in async_exception_hook: #206 (comment)

mfp · 2017-04-20T18:58:14Z

This is probably something that Async got right (it doesn't "bind eagerly").

A related (and very nice) property is that it guarantees context switches can only happen in bind, which in Lwt's case would be akin to wakeup and wakeup_later being fused into a single function with the semantics the latter has when already wakening (similar changes required in wakeup_exn, cancel, and others).

Maybe the "bind quirk" wrt. to map can be resolved (hah!) the other way...

I think that both ensuring that OCaml-level exceptions are captured properly in the RHS of a bind (and the other functions you listed) and guaranteeing the context switch property by changing the wakeup semantics are valuable goals. Two conses come to mind:

it might break existing code that relies on the current eager evaluation semantics
it imposes some performance cost

Any others?

(2) seems a priori trivial for practical uses cases, but (1) is more troubling. I used to think it was way too late for such a change, but after seeing how the deprecation scheme worked for 3.0, maybe, and if the change is deemed worthy, something similar could be done to e.g. introduce a new Lwt.resolve function with the new semantics and deprecate wakeup_* so that the switch is completed by 4.0 or 5.0? (This needs much further consideration of course, I'm just pointing it out as a possibility).

aantron · 2017-04-20T19:22:31Z

Yes, this is pretty much exactly what I want to do to all those functions.

it might break existing code that relies on the current eager evaluation semantics

For this, I thought to do some kind of study (exhaustive or probabilistic) of at least the code in OPAM. My hunch is that it will break very little actual code, though code in test cases or other contexts that don't depend on Lwt_main.run in some projects might be more susceptible.

Also, I don't think we would be breaking the documented API, though breaking its undocumented behavior is still dangerous. Unfortunately, it's some kind of well-known fact in the community that f will run immediately if p is ready. I hope most code authors assumed that they can't know when f will run, because that's the general case, and so is easier to think about.

it imposes some performance cost

I would guess that the performance cost would be from extra allocations of deferred resolution queue nodes, and from caching issues due to resolving things in a different order than now.

But, I thought about this in the last couple weeks, and we have at least one lever to mitigate this – we can make Lwt "finitely reentrant," where, let's say, up to 42 resolutions of a promise can be run without deferring. The extra cost would be maintaining this counter, which should be extremely cheap, and setting up an exception handler around each immediate call, which should also be cheap. After the 42nd resolution down into the current stack, the next resolution goes in the queue, i.e. it is treated like binding on something that isn't yet ready.

In what I originally proposed above, 42 is instead 0 (no immediate resolutions allowed). The current semantics allow an unbounded number of immediate resolutions, which forces us to use tail recursion in bind.

I am pretty sure we can make this change using a combination of study, soft breakage (3.0.0 style), and performance tweaks, the latter of which might not even be necessary. The payoff would be much more predictable and teachable semantics.

The above applies to bind, catch, etc. Lwt.wakeup is a different beast, because it is explicitly supposed to run the resolution now, though in a weird way: because the Lwt.wakeup_later docs say that wakeup_later is not guaranteed to run callbacks immediately. I suspect that the real reason for most usage of Lwt.wakeup is just that the name is shorter than Lwt.wakeup_later, but there is likely code that actually depends on Lwt.wakeup running immediately. I have also written such code myself.

But, since we already have these functions, we can just deprecate Lwt.wakeup, leave its functionality alone and warn people about it, and encourage usage of Lwt.wakeup_later. We can give wakeup_later a new name, but I don't think it's necessary. Maybe once the promise (or whatever) terminology is truly settled, we can think about which functions to strategically rename (without removing the old names, of course), as a separate issue.

mfp · 2017-04-20T21:37:30Z

But, I thought about this in the last couple weeks, and we have at least one lever to mitigate this – we can make Lwt "finitely reentrant," where, let's say, up to 42 resolutions of a promise can be run without deferring.

That's clever. It's nice to be able to make the presumed overhead arbitrarily small (modulo the costs you mentioned).

I suspect that the real reason for most usage of Lwt.wakeup is just that the name is shorter than Lwt.wakeup_later, but there is likely code that actually depends on Lwt.wakeup running immediately. I have also written such code myself.

Also the fact that Lwt.wakeup has existed AFAIK since Lwt's early days (in unison's tree), and Lwt.wakeup_later was introduced much later in 2011.

But, since we already have these functions, we can just deprecate Lwt.wakeup, leave its functionality alone and warn people about it, and encourage usage of Lwt.wakeup_later. We can give wakeup_later a new name, but I don't think it's necessary.

Yes on second thought just promoting Lwt.wakeup_latershould do. There's a minuscule chance somebody out there could be relying on Lwt.wakeup_later's precise (but undocumented!) semantics and it behaving like Lwt.wakeup when not in wakeup phase, but it seems most unlikely: it cannot be in library code anyway, where there's no telling whether the code will be invoked in a 'wakeup' (and going out of one's way to detect whether it's the case would be borderline criminal behavior :)

aantron · 2017-04-20T21:50:20Z

Also the fact that Lwt.wakeup has existed AFAIK since Lwt's early days (in unison's tree), and Lwt.wakeup_later was introduced much later in 2011.

That too :) Hadn't seen that.

behaving like Lwt.wakeup when not in wakeup phase

Actually, now I think it will still have to do that. If this is the top-level call to Lwt.wakeup_later, I think the promise's callbacks must be called immediately, because once Lwt.wakeup_later returns, Lwt loses control, potentially forever (sounds a bit over-dramatic...).

So it will be something like: run callbacks now, unless already running callbacks. I guess it will be that for bind too, so the max number of nested immediate wakeups in the original proposal would be 1, not 0. Of course, if we have to resort to the tweak, it will be like: run callbacks now, unless this is the 43rd nested attempt to run callbacks. Which sounds ridiculous :p

aantron · 2017-04-20T21:54:54Z

To be more precise above, s/top-level call to Lwt.wakeup_later/top-level entry into Lwt/, where entry means running any callback of a promise (including by bind and elsewhere), and therefore running the Lwt book-keeping associated with that.

aantron · 2017-04-20T22:16:05Z

On the matter of how to make such a change relatively gently, since we want to reuse the infix names and the PPX syntax (i.e. change their semantics), we can't really deprecate the existing names and ask everyone to patch their code repeatedly.

I think, if the revdeps study shows that very few users will be affected, then we can go ahead with this. We will make some minor (n.N.0) release where we announce that this change is coming up. The announcement goes in the changelog, the mailing list, messages in the opam file, and anywhere else we can think of.

Simultaneously, we can make a branch of Lwt available with the new semantics already applied, and provide clear instructions on how to pin that branch, and a clear, justified explanation of what the change is and why we are doing it – a summary of this discussion.

With the pin, a user can exercise their code base (run test suite, etc.) against the new semantics by running their normal build process. If exercising the code seems unproblematic (hopefully the typical outcome), then the user doesn't have to do anything except unpin Lwt. Otherwise, the user can constrain Lwt to < N.0.0, adjust their code so that it works with both current and new semantics, or come to the repo and tell us why this course of action is idiotic. And, in the latter case, if the user is right, we will have time to agree and change our minds :)

mfp · 2017-05-13T14:31:33Z

What would be needed for the 3.1.0 milestone wrt. this?

Making the branch with the new semantics available? Not sure how it ties to 3.1.0 effectively; it could be made available at any time (with enough advance) before (hypothetically) 4.0. Or to put it in another way, it doesn't seem to me 3.1.0 would block on this.

What we can and probably should do already in 3.1.0 is try to encourage the transition from wakeup to wakeup_later. We can mention this in the ANN and the mli, but ideally we'd want an attribute with the same semantics as ocaml.deprecated and with an arbitrary message of ours. ocaml.ppwarning doesn't do since it would only fire at definition (not usage) site AFAICS: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec247

aantron · 2017-05-13T15:46:26Z

I agree that 3.1.0 doesn't block on this. To answer a hypothetical broader query, any issue can be dropped from a milestone if we don't have enough time, etc. – I just assigned this issue to 3.1.0 as a reminder to give the issue a hard look, in case we want to do 4.0.0 three months after 3.1.0, without an intervening minor release, and want to commit this change in 4.0.0. That might not be realistic anyway, because the revdeps study is going to take a while.

But, to answer the actual specific concern that I believe you have: in 3.1.0 (or whatever minor release we have enough time to prepare this for), we should probably

add an announcement to opam, so that anyone installing or upgrading to that minor release will get a warning and link to the branch.
That requires the branch be made available before the minor release is published to OPAM.

So basically, it's because that minor release's announcements are maybe the best time to announce the change and the branch, and the opam file is one case where I think it is the only reasonable time.

I agree that we should probably discourage using wakeup, independently of this whole process with the semantics. I actually think deprecating it is fine. @@ocaml.deprecated seems right to me, and we probably need to write something in an issue or on the wiki explaining why "wakeup considered harmful," and link to that from the deprecation message. Are you concerned that deprecated doesn't do the right thing?

mfp · 2017-05-13T16:46:05Z

I seemed to remember you'd rejected the idea of deprecating it before, but it was actually the renaming of wakeup_later to resolve you deemed unnecessary. (The reason I put that possibility forward is that a rename would raise attention towards the new semantics and its rationale, the name would be shorter and consistent with the new terminology, but it's not clear whether it is worth the effort).

aantron · 2017-05-13T16:54:35Z

Well, I think it's unnecessary as part of this specific issue, but I would like resolve at some point, in general :) I'm just worried about the terminology. In the lwt.ml refactoring, I ended up using "resolve" to correspond to Result.Ok, "fail" to correspond to Result.Error, and "complete" to correspond to either one of these two actions. I don't like "complete," and it seems like "resolve" would be promoted to the meaning of complete once there is only one way to resolve a promise, in a theoretical Lwt that does not conflate exception handling with concurrency. But I don't want to get into all these issues now, and I'm scared to use nice names like "resolve" that we want to save for later, now, before the semantics are settled, because we might have to throw them away upon doing more work :/

Basically, I want to reserve nice names like "resolve" for potential even larger semantic changes.

aantron · 2018-03-24T05:35:26Z

Closing this for now as Lwt won't resolve it in the immediate future. See https://discuss.ocaml.org/t/1337/13 and #453 (comment).

aantron added the question label Apr 2, 2017

aantron modified the milestone: 4.0.0 Apr 13, 2017

aantron added breaking and removed question labels Apr 29, 2017

aantron mentioned this issue May 9, 2017

turn a few Lwt_list functions into a tail recursive variant #347

Merged

aantron modified the milestones: 3.1.0, 4.0.0 May 12, 2017

aantron mentioned this issue May 14, 2017

lwt.ml: human-friendly edition – major refactoring and commenting of the Lwt core #354

Merged

aantron added the difficult label May 20, 2017

aantron modified the milestones: 3.1.0, 3.2.0 Jul 18, 2017

aantron modified the milestones: 3.2.0, 3.1.0 Jul 18, 2017

aantron added the in progress label Oct 14, 2017

This was referenced Nov 9, 2017

Improve exception safety of some Lwt_list APIs #499

Merged

Lwt semantic improvements: exception safety and stack overflows #500

Merged

aantron modified the milestones: 3.2.0, 4.0.0 Nov 10, 2017

aantron mentioned this issue Dec 23, 2017

[4.0.0] Switch to new callback deferral semantics #519

Closed

aantron closed this as completed Mar 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind quirk #329

Bind quirk #329

aantron commented Apr 2, 2017

hcarty commented Apr 2, 2017

mfp commented Apr 2, 2017 via email

aantron commented Apr 2, 2017 •

edited

Loading

aantron commented Apr 2, 2017

mfp commented Apr 2, 2017 via email •

edited

Loading

aantron commented Apr 2, 2017 •

edited

Loading

aantron commented Apr 3, 2017

aantron commented Apr 3, 2017

mfp commented Apr 20, 2017

aantron commented Apr 20, 2017

mfp commented Apr 20, 2017

aantron commented Apr 20, 2017

aantron commented Apr 20, 2017

aantron commented Apr 20, 2017

mfp commented May 13, 2017

aantron commented May 13, 2017

mfp commented May 13, 2017

aantron commented May 13, 2017

aantron commented Mar 24, 2018

Bind quirk #329

Bind quirk #329

Comments

aantron commented Apr 2, 2017

hcarty commented Apr 2, 2017

mfp commented Apr 2, 2017 via email

aantron commented Apr 2, 2017 • edited Loading

aantron commented Apr 2, 2017

mfp commented Apr 2, 2017 via email • edited Loading

aantron commented Apr 2, 2017 • edited Loading

aantron commented Apr 3, 2017

aantron commented Apr 3, 2017

mfp commented Apr 20, 2017

aantron commented Apr 20, 2017

mfp commented Apr 20, 2017

aantron commented Apr 20, 2017

aantron commented Apr 20, 2017

aantron commented Apr 20, 2017

mfp commented May 13, 2017

aantron commented May 13, 2017

mfp commented May 13, 2017

aantron commented May 13, 2017

aantron commented Mar 24, 2018

aantron commented Apr 2, 2017 •

edited

Loading

mfp commented Apr 2, 2017 via email •

edited

Loading

aantron commented Apr 2, 2017 •

edited

Loading