Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast effect forwarding #256

Merged
merged 4 commits into from
Nov 26, 2024
Merged

Fast effect forwarding #256

merged 4 commits into from
Nov 26, 2024

Conversation

frank-emrich
Copy link

This PR fundamentally changes how we handle tag/effect forwarding: Currently, at any suspend site, we unconditionally attempt to switch to the parent stack (or fail, if we are already on the the main stack). In the process, we pass the tag identifier that we suspended with. Then, we check if the resume clause in the parent had a handler for that tag. If it doesn't, we attempt to suspend again to the parent of the (original) parent. This process is repeated until we either find a handler, or reach the main stack (meaning that there was no appropriate handler).

The new design introduced by this PR works as follows: Every stack (main stack or continuation) carries a HandlerList around. This is effectively a vector of tag identifiers (i.e., *mut VMTagDefinition).

On resume $ct (on $tag_1 $block_1) ... (on $tag_n $block_n), we then fill the active stack's HandlerList with the following n entries:

<address of $tag_1>
...
<address of $tag_n>

On suspend $tag, we then traverse the chain of active continuations, inspecting the HandlerList of the parents of the currently active continuation. If we find the address of $tag as the _i_th entry of the handler list of some resume instruction in our parents, we directly switch to the stack that executed the corresponding resume instruction. In the process, we pass along the list index i. Back at the resume site, we can then directly jump to the _i_th handler block, using a jump table.
If the handler search does not succeed, we immediately trap at the suspend site.

Some implementation notes:

  • The search for an appropriate handler is implemented in wasmtime_cranelift::wasmfx::optimized::search_handler, which is used for emitting code at suspend sites.
  • While there are a lot of changes to translate_resume, it is actually simpler now! The rather annoying back edge from the forward block to the resume block is gone, and every block will at most be executed once.
  • ControlEffect is no longer using TaggedPointer. There are simply no more pointers involved here at all, the only payload to hand from suspend site to resume site (in addition to the return vs suspend signal) is the index of the handler we want to use.

In general, everything is built to be compatible with switch being added on top of this design: In particular, the HandlerList and aforementioned search_handler can easily be adapted to support switch, and the changes to translate_resume are made with switch in mind.

@frank-emrich frank-emrich requested a review from dhil November 18, 2024 17:52
crates/continuations/src/lib.rs Outdated Show resolved Hide resolved
/// contref = chain_link.get_contref()
/// parent_link = contref.parent
/// parent_csi = parent_link.get_common_stack_information();
/// handlers = parent_csi.handlers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. The handlers should sit on the current continuation reference. If the current one doesn't handle the suspension, then we try the parent, and so forth.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand what you are saying.

Consider the following example:

  ...
  
   ^
   | parent 
   
  c1
  
   ^
   | parent
   
  c2

where $c1 executed (resume $ct (on $t switch) ... (local.get $c2)) and $c2 is currently executing.

In the current design in this PR, continuation $c1 would have a handler list l with a switch entry for $t (as well as all other tags potentially occurring in the resume above).
Continuation $c2 is currently running and has no handler list.

Are you suggesting that the handler list should be attached to $c2 instead, meaning that each continuation stores what is handled by its immediate parent?

The disadvantage of that approach over the current one is that it requires additional work on switch instructions:
In the example above, if $c2 then executes (switch $ct' $t (local.get $c3)) for some $c3, we would have to update the handler lists such that l becomes the handler list of $c3 (and the handler list of $c2 is cleared for housekeeping).

This work isn't required if the handler list l is associated with $c1 instead (and stays there on switch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about querying the parent first. We do that in all our implementations, I see. On reflection, it must be like that, because the handler can grow its continuation, and when resuming a captured continuation it needs to be spliced with the handler's continuation.

Regarding switch and clearing the handlers on the continuation reference, I think the only thing that needs to happen for $c3 is that it has its parent adjusted (similar for the current continuation). We should never have to update the handlers of "switched" continuations, I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding switch and clearing the handlers on the continuation reference, I think the only thing that needs to happen for $c3 is that it has its parent adjusted (similar for the current continuation). We should never have to update the handlers of "switched" continuations, I think.

Yes, this is what the design in this PR ensures. You don't have to touch the handlers on switch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of switch is similar to the semantics of a deep handler, in the sense that the handler stays in place after resume. So any proper design should not have to modify the handler on switch.

let raw_parts = builder.block_params(handle_link);
let chain_link =
tc::StackChain::from_raw_parts([raw_parts[0], raw_parts[1]], env.pointer_type());
let is_main_stack = chain_link.is_main_stack(env, builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like this special case of the main stack. But we can fix that in another patch. Ideally, every stack is homogeneous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "special case for the main stack" here. We are traversing a linked list, whose last element is the main stack.
The only thing that is_main_stack is used for is detecting that we have reached the end of the list without finding a handler.
You could rename is_main_stack to has_no_parent. Such a check will always be required in this traversal.
The rest of the code treats continuations and the main stack is completely homogeneously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was conceptual. It is not about this particular code. We shouldn't have a notion of a main stack at all, is my point.


// Note that the control context we use for switching is not the one in
// (the stack of) resume_contref, but in (the stack of) last_ancestor!
let fiber_stack = last_ancestor.get_fiber_stack(env, builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this "last ancestor" business. Please explain. See my other comment below.

/// | +----------------+
/// | ^
/// | | parent
/// last ancestor | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so reading this drawing I may understand what you are on about with the ancestor business. What you call last_ancestor is really the initial frame of the delimited continuation that is being captured. I think we should name all this differently. The frame is not the "last" ancestor, as it may be adopted by another ancestor etc. However, it is the tail of the linked list that defines the continuation. Thus, I think it is better to use names such as "tail"/"head" to describe these objects. The data structure you are building is known as a "circular list". I think it is worthwhile to adopt this terminology to make the implementation easier to understand and more readily accessible to other people.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly married to the name last_ancestor. However, this field is only ever set on the "head" continuation (bottom-most continuation in the picture) and only if it is suspended. Thus, if its set, it indeed points to the last ancestor.

I'm not particularly keen on the term tail, since it is somewhat overloaded in the context of linked lists. It is both used to refer to the next node in a linked list, but also to refer to the end of the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to consider anything but last ancestor. Am I correct that the continuation object points to the bottom stack in the chain (i.e. the stack that executed the suspension)? If so, then I think we can optimise it ever-so-slightly but returning the "last ancestor" instead and set its parent to be the bottom stack. We should save a field that way too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible, but I would prefer holding off from that. The reason is that it complicates how at suspend, we propagate the required information to the resume site.
In the current version of this PR, the following happens:

  1. We identify the currently running continuation c1 by reading the stack chain field of the VMContext. This will be the bottom of the continuation chain of the continuation object we are about to construct.
  2. The handler search gives us the other end of the chain, c2. We set c1.last_ancestor to c2.
  3. We perform the stack switch.
  4. Back in resume, we read c1 from the VMContext (it's still saved in the stack chain field) and obtain the other end (c2) from its last_ancestor field.

In the design you propose, we would instead make c2 point to c1 in step 2. But this means that in the 4th step above, we have no way to obtain c2. We would have to either temporarily write c2 into the VMContext or extend the switch instruction so that we can pass more payloads. I'd very much prefer avoiding that, at least for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, it strikes again. I understand. We really need to get rid of the auxiliary structures. We should be able to encode everything in one chain. Certainly we need this before/for the upstreaming.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I described is orthogonal from the details of how we represent the chain of currently active continuations. In any design, there will always be a field in the VMContext that points to the currently active continuation. That's the field accessed in step 1 and 4 above.
In that sense, the design in this PR works without requiring any new channels (in the widest sense) to get information from the suspend to the resume site.

Copy link
Member

@dhil dhil Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced. To me the current design/implementation has a certain spaghetti feeling to it. But it can be refactored later. Clarification: I am not talking about this patch, but our entire implementation. It is too convoluted in my opinion. It is an obvious consequence of how we have organically evolved the implementation too. I am positive a refactoring will reveal a much simpler design/implementation.

@dhil
Copy link
Member

dhil commented Nov 19, 2024

All this list business, reminds me of some tricks we should try. One trick used to speed up linked list searching is the "move to front" trick, where once an element has been found it is moved to the front of the list, such that next time you look for it, it may be one of the first elements. Obviously, we don't want exactly that, but a variant of it may speed up continuation materialisation even further, naming, what we could have on each continuation reference is a pointer to the most recent suspension handler (like caching the latest handler).

@dhil
Copy link
Member

dhil commented Nov 20, 2024

I guess we can merge this as-is.

Co-authored-by: Daniel Hillerström <daniel.hillerstrom@ed.ac.uk>
@frank-emrich
Copy link
Author

I guess we can merge this as-is.

I'm fine with that, but would you still prefer renaming the last_ancestor field in VMContRef? I can live with tail, but open for other suggestions.

@dhil
Copy link
Member

dhil commented Nov 26, 2024

I guess we can merge this as-is.

I'm fine with that, but would you still prefer renaming the last_ancestor field in VMContRef? I can live with tail, but open for other suggestions.

I don't think it is necessary now. I was just commenting that I think we should come up with a better name for when we upstream it.

@dhil dhil merged commit 76c23dd into wasmfx:main Nov 26, 2024
41 checks passed
dhil added a commit that referenced this pull request Dec 9, 2024
This PR provides the missing piece to support `switch` instructions, by
adding the necessary codegen (but only for the optimized implementation,
support in the baseline implementation is not included).

Thus, the main change is the addition of
`wasmtime_cranelift::wasmfx::optimized::translate_switch`.

In general, the implementation piggybacks on #256: On `resume`, we now
fill the `HandlerList` with _m_ entries for all tags with suspend
handlers, followed by _n_ entries for all tags with switch handlers. The
`search_handler` code is changed so that on `suspend` and `switch`, we
look in the correct part of the `HandlerList`. We could use two separate
`HandlerLists` instead, but then we would have yet another allocation to
manage, so putting both kinds of tags into the same list, and then only
searching part of it seems preferable.


A few more notes:
- `translate_resume` and `translate_switch` now return a `WasmResult`.
This is used so that the baseline implementation can bail out without
needing to `panic`.
- The test runner in `stack_switchting.rs` now takes an extra parameter
that allows us to enable the gc proposal, which is required for the
tests using recursive types.

---------

Co-authored-by: Daniel Hillerström <daniel.hillerstrom@ed.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants