Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wazevo: initial impl of the new optimizing backend #1615

Merged
merged 4 commits into from
Aug 9, 2023
Merged

wazevo: initial impl of the new optimizing backend #1615

merged 4 commits into from
Aug 9, 2023

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Aug 8, 2023

This commit introduces the initial implementation of our new optimizing
backend called "wazevo" (#1496) which I have worked on in the last 2 months.

The new backend is written completely from scratch without reusing any line of the
existing compiler, assembler etc. The notable difference is that we employ the traditional
way of compilation pipeline, e.g. SSA-level IR, optimization passes, instruction selections,
register allocations, etc as opposed to the current single pass compiler. As a result,
for example, we can now use the register-based calling conventions etc which will definitely
contribute to the perf improvement. As a sneak preview of the perf improvement,
the following is the bench result of recursive_fibonacci(30).

Benchmark_wazevo
Benchmark_wazevo/old
Benchmark_wazevo/old-10         	     106	  10891053 ns/op
Benchmark_wazevo/wazevo
Benchmark_wazevo/wazevo-10      	     286	   4175793 ns/op

As expected, this is far from completion (in fact this doesn't pass spectest at all!).
Instead, the purpose of this commit is to set up the foundation and code base that
multiple people can work on collaboratively. For now, it only has AArch64 backend,
and the plan is to make it pass all spec tests before we start introducing x64 backend.
That way we could have a better abstraction layer over ISA-specific code. In any case,
this leaves tens of TODOs intentionally, and we will iterate on this
new codebase over the next few months in the subsequent PRs.

Notes

Pipeline

Basically, we mainly have three components:

  1. Frontend Compiler is in charge of lowering WebAssembly-level functions into SSA-level functions.
  2. SSA Builder provides the interface between frontend and backend and is decoupled from Wasm-level concepts. It provides the necessary logic and functions to the Frontend Compiler to lower Wasm-level functions to neutral SSA-IR functions. After the construction of IR-functions, it also can be used to access the constructions functions by Backend.
  3. Backend Compiler lowers the SSA-IR level functions to the ISA-specific instruction sequences. This takes a SSA Builder as its input. This does some optimization during the course of lowering as well as the register allocations.

After going through all three components for all functions in a module, we also resolve the "relocation" of functions. In other words, we resolve local function calls as relative address "direct branches', rather than indirect jumps.

The following diagram illustrates the overview of the pipeline:

graph TD;
    SSA_Builder[[SSA Builder]]
    Frontend_Compiler(Frontend Compiler)
    Backend_compiler(Backend Compiler)
    SSA_Function(SSA Function)
    Machine_interface[(Machine Interface)]
    ISA_arm64((AArch64))
    ISA_x64((x64))

    foo.wasm-- for each function -->Frontend_Compiler;
    Frontend_Compiler-->SSA_Function;
    SSA_Function-->Backend_compiler;
    Backend_compiler-- after all functions --> Relocations
    Machine_interface-->Backend_compiler
    ISA_arm64-->Machine_interface
    ISA_x64-->Machine_interface
    SSA_Builder---Frontend_Compiler;
    SSA_Builder---SSA_Function;
    SSA_Builder---Backend_compiler;
    ssa_passes[["CFG Analysis 
SSA-level optimizations
Block Ordering"]]
    ssa_passes--->SSA_Function
    backend_passes[["Instruction Selection
Livenesss Analysis
Register Allocation
Branch Resolution
Binary Encoding"]]
    backend_passes--->Backend_compiler

Loading

SSA construction

The way we do to lower Wasm-level control flow/variables into SSA-form is based on the paper Simple and Efficient Construction of Static Single Assignment Form. We use the "block argument" variant, instead of using PHIs like MLIR

Register Allocation

The implemented register allocation logic is a very simple variant of Chaitin's algorithm. The code works on the ISA-level IR, not the SSA-level one, but it abstracts away the specifics of ISA via interfaces. It does the straightforward graph coloring algorithm using the interference graph where the interference is calculated in the CFG-level, not the linearized code.

Before the allocation algorithm runs, we does the liveness analysis following the algorithm described in the Chapter 9 of the SSA-based Compiler Design bool.

Frame Pointer usage

As a simple example of compilation result, the Wasm function which swaps two params:

(func (param i32 i32) (result i32 i32)
    local.get 1
    local.get 0
)

is compiled as

str x30, [sp, #-0x10]!
mov x0, x3
mov x1, x2
ldr x30, [sp], #0x10
ret

where you can see we only use sp, but not fp which is kinda weird compared to other compilers (note that x0 and x1 params are implementation specific and not relevant here. The actual Wasm param starts at x2/v2). Basically, FP is only
necessary when we want to integrate native debugger and profiles, which is in our case not relevant. Moreover, the typical way of saving/restoring frame pointer (at prologue/epilogue) cannot be applied here because our stack is Go-allocated byte slice therefore it "moves". As a consequence, saving the absolute address of FP is absolutely dangerous therefore we do not store it at all. Speaking of SP, the caller/callee always knows how much space needed and hence we only just need to add/sub relative amount. (Of course that can be applied to FP but it's costly and not necessary at all!).

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake marked this pull request as ready for review August 8, 2023 02:39
@codefromthecrypt
Copy link
Contributor

well I have to say, thank god it isn't 24k lines 😎

Copy link
Member Author

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notes

Comment on lines +94 to +108
// Lower Wasm to SSA.
err := fe.LowerToSSA()
if err != nil {
return fmt.Errorf("wasm->ssa: %v", err)
}

// Run SSA-level optimization passes.
ssaBuilder.RunPasses()

// Finalize the layout of SSA blocks which might use the optimization results.
ssaBuilder.LayoutBlocks()

// Now our ssaBuilder contains the necessary information to further lower them to
// machine code.
body, rels, goPreambleSize, err := be.Compile()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the outermost of the pipeline

Comment on lines +143 to +144
// Resolve relocations for local function calls.
machine.ResolveRelocations(e.refToBinaryOffset, executable, e.rels)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after compiling all local functions, we can resolve the relocations (relative offers between the call instructions and the target functions).

}
}

func (c *Compiler) lowerOpcode(op wasm.Opcode) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where lower Wasm instructions into SSA instructions

//
// Note that passes suffixed with "Opt" are the optimization passes, meaning that they edit the instructions and blocks
// while the other passes are not, like passEstimateBranchProbabilities does not edit them, but only calculates the additional information.
func (b *builder) RunPasses() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where IR-level optimization passes run


type (
// Machine is a backend for a specific ISA machine.
Machine interface {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface implemented per ISA (of course currently only arm64 tho)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces in this file are implemented per ISA so that register allocation can run without ISA specifics


type (
// machine implements backend.Machine.
machine struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arm64 implementation of Machine interface

Comment on lines +49 to +74
//
// (high address)
// +-----------------+
// | ....... |
// | ret Y |
// | ....... |
// | ret 0 |
// | arg X |
// | ....... |
// | arg 1 |
// | arg 0 |
// | xxxxx |
// | ReturnAddress |
// +-----------------+ <<-|
// | ........... | |
// | spill slot M | | <--- spillSlotSize
// | ............ | |
// | spill slot 2 | |
// | spill slot 1 | <<-+
// | clobbered N |
// | ........... |
// | clobbered 1 |
// | clobbered 0 |
// SP---> +-----------------+
// (low address)
//
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diagram to explain the stack layout

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file implements the "trampoline" prologue executed right after being jumped from Go.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake
Copy link
Member Author

mathetake commented Aug 9, 2023

now that 1.4.0 has been cut, I am landing this, though no impact on the main code any way

@mathetake mathetake merged commit 0290087 into main Aug 9, 2023
59 checks passed
@mathetake mathetake deleted the wazevo branch August 9, 2023 01:45
jerbob92 pushed a commit to jerbob92/wazero that referenced this pull request Aug 9, 2023
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Jeroen Bobbeldijk <jeroen@klippa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants