Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen for switch instructions #264

Merged
merged 10 commits into from
Dec 9, 2024
Merged

Codegen for switch instructions #264

merged 10 commits into from
Dec 9, 2024

Conversation

frank-emrich
Copy link

@frank-emrich frank-emrich commented Nov 29, 2024

This PR provides the missing piece to support switch instructions, by adding the necessary codegen (but only for the optimized implementation, support in the baseline implementation is not included).

Thus, the main change is the addition of wasmtime_cranelift::wasmfx::optimized::translate_switch.

In general, the implementation piggybacks on #256: On resume, we now fill the HandlerList with m entries for all tags with suspend handlers, followed by n entries for all tags with switch handlers. The search_handler code is changed so that on suspend and switch, we look in the correct part of the HandlerList. We could use two separate HandlerLists instead, but then we would have yet another allocation to manage, so putting both kinds of tags into the same list, and then only searching part of it seems preferable.

A few more notes:

  • translate_resume and translate_switch now return a WasmResult. This is used so that the baseline implementation can bail out without needing to panic.
  • The test runner in stack_switchting.rs now takes an extra parameter that allows us to enable the gc proposal, which is required for the tests using recursive types.

@frank-emrich frank-emrich changed the title Codegen for switch instruction Codegen for switch instructions Nov 29, 2024
@frank-emrich frank-emrich requested a review from dhil November 29, 2024 16:51
crates/cranelift/src/translate/code_translator.rs Outdated Show resolved Hide resolved
crates/cranelift/src/translate/code_translator.rs Outdated Show resolved Hide resolved
crates/cranelift/src/wasmfx/optimized.rs Outdated Show resolved Hide resolved
tests/all/stack_switching.rs Outdated Show resolved Hide resolved
Comment on lines +2512 to +2518
let slot_size = ir::StackSlotData::new(
ir::StackSlotKind::ExplicitSlot,
CONTROL_CONTEXT_SIZE as u32,
env.pointer_type().bytes() as u8,
);
let slot = builder.create_sized_stack_slot(slot_size);
let tmp_control_context = builder.ins().stack_addr(env.pointer_type(), slot, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how the semantics of StackSlotData::new is; but I am worried that we allocate a stack slot per switch instruction in the current context, so if my program is

 (switch ...)
 (switch ...)
 (switch ...)

then we allocate three slots, even though only one is necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we currently allocate one stack slot per switch and they are not shared/re-used in the example you give.

I don't think that's terribly bad (we only need space for 3 pointers per switch and I assume that few functions will contain more than one switch), but it's relatively easy to add some shared state to FuncEnvironment so that we re-use the slots between multiple switch.

Copy link
Member

@dhil dhil Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no data to validate or refute the assumption. I suggest we add a comment to remark that there may be an issue here, and if it turns out to be an issue then there is a straightforward cache-fix by extending the FuncEnvironment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to experiment with using stack slots to encode "handlerlists". I don't know whether it would offer any speedups, but it would eliminate some heap memory dependency. Similarly, it ought to be possible to write resume/suspend/switch arguments directly on the target stack using the stack slots API, thus potentially eliminating the need for variable-sized vectors.

Copy link
Author

@frank-emrich frank-emrich Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment now about the possibility of re-using the stack slots.

It would be interesting to experiment with using stack slots to encode "handlerlists". [...] Similarly, it ought to be possible to write resume/suspend/switch arguments directly on the target stack using the stack slots API, [...]

Yes, I thought about that a while ago. It should be relatively straightforward, with one annoyance: For the initial payloads buffer (i.e., the one consumed by the array call trampoline, represented by the args field in VMContRef), we need to do the reservation of some stack space ourselves, somewhere at the top of the fiber stack. We cannot use the ordinary stack slot API for that. But this is mostly a matter of updating some ASCII art and tweaking the initialization functions.

@frank-emrich
Copy link
Author

@dhil Are you okay with this landing now?

@dhil
Copy link
Member

dhil commented Dec 9, 2024

@dhil Are you okay with this landing now?

yes, lets do it now. I am minutes from opening another merge PR.

@dhil dhil merged commit 8b79a23 into wasmfx:main Dec 9, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants