-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codegen for switch instructions #264
Conversation
Co-authored-by: Daniel Hillerström <daniel.hillerstrom@ed.ac.uk>
let slot_size = ir::StackSlotData::new( | ||
ir::StackSlotKind::ExplicitSlot, | ||
CONTROL_CONTEXT_SIZE as u32, | ||
env.pointer_type().bytes() as u8, | ||
); | ||
let slot = builder.create_sized_stack_slot(slot_size); | ||
let tmp_control_context = builder.ins().stack_addr(env.pointer_type(), slot, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how the semantics of StackSlotData::new
is; but I am worried that we allocate a stack slot per switch
instruction in the current context, so if my program is
(switch ...)
(switch ...)
(switch ...)
then we allocate three slots, even though only one is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we currently allocate one stack slot per switch and they are not shared/re-used in the example you give.
I don't think that's terribly bad (we only need space for 3 pointers per switch
and I assume that few functions will contain more than one switch
), but it's relatively easy to add some shared state to FuncEnvironment
so that we re-use the slots between multiple switch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no data to validate or refute the assumption. I suggest we add a comment to remark that there may be an issue here, and if it turns out to be an issue then there is a straightforward cache-fix by extending the FuncEnvironment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to experiment with using stack slots to encode "handlerlists". I don't know whether it would offer any speedups, but it would eliminate some heap memory dependency. Similarly, it ought to be possible to write resume/suspend/switch arguments directly on the target stack using the stack slots API, thus potentially eliminating the need for variable-sized vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment now about the possibility of re-using the stack slots.
It would be interesting to experiment with using stack slots to encode "handlerlists". [...] Similarly, it ought to be possible to write resume/suspend/switch arguments directly on the target stack using the stack slots API, [...]
Yes, I thought about that a while ago. It should be relatively straightforward, with one annoyance: For the initial payloads buffer (i.e., the one consumed by the array call trampoline, represented by the args
field in VMContRef
), we need to do the reservation of some stack space ourselves, somewhere at the top of the fiber stack. We cannot use the ordinary stack slot API for that. But this is mostly a matter of updating some ASCII art and tweaking the initialization functions.
@dhil Are you okay with this landing now? |
yes, lets do it now. I am minutes from opening another merge PR. |
This PR provides the missing piece to support
switch
instructions, by adding the necessary codegen (but only for the optimized implementation, support in the baseline implementation is not included).Thus, the main change is the addition of
wasmtime_cranelift::wasmfx::optimized::translate_switch
.In general, the implementation piggybacks on #256: On
resume
, we now fill theHandlerList
with m entries for all tags with suspend handlers, followed by n entries for all tags with switch handlers. Thesearch_handler
code is changed so that onsuspend
andswitch
, we look in the correct part of theHandlerList
. We could use two separateHandlerLists
instead, but then we would have yet another allocation to manage, so putting both kinds of tags into the same list, and then only searching part of it seems preferable.A few more notes:
translate_resume
andtranslate_switch
now return aWasmResult
. This is used so that the baseline implementation can bail out without needing topanic
.stack_switchting.rs
now takes an extra parameter that allows us to enable the gc proposal, which is required for the tests using recursive types.