wazevo: initial impl of the new optimizing backend #1615

mathetake · 2023-08-08T02:13:16Z

This commit introduces the initial implementation of our new optimizing
backend called "wazevo" (#1496) which I have worked on in the last 2 months.

The new backend is written completely from scratch without reusing any line of the
existing compiler, assembler etc. The notable difference is that we employ the traditional
way of compilation pipeline, e.g. SSA-level IR, optimization passes, instruction selections,
register allocations, etc as opposed to the current single pass compiler. As a result,
for example, we can now use the register-based calling conventions etc which will definitely
contribute to the perf improvement. As a sneak preview of the perf improvement,
the following is the bench result of recursive_fibonacci(30).

Benchmark_wazevo
Benchmark_wazevo/old
Benchmark_wazevo/old-10         	     106	  10891053 ns/op
Benchmark_wazevo/wazevo
Benchmark_wazevo/wazevo-10      	     286	   4175793 ns/op

As expected, this is far from completion (in fact this doesn't pass spectest at all!).
Instead, the purpose of this commit is to set up the foundation and code base that
multiple people can work on collaboratively. For now, it only has AArch64 backend,
and the plan is to make it pass all spec tests before we start introducing x64 backend.
That way we could have a better abstraction layer over ISA-specific code. In any case,
this leaves tens of TODOs intentionally, and we will iterate on this
new codebase over the next few months in the subsequent PRs.

Notes

Pipeline

Basically, we mainly have three components:

Frontend Compiler is in charge of lowering WebAssembly-level functions into SSA-level functions.
SSA Builder provides the interface between frontend and backend and is decoupled from Wasm-level concepts. It provides the necessary logic and functions to the Frontend Compiler to lower Wasm-level functions to neutral SSA-IR functions. After the construction of IR-functions, it also can be used to access the constructions functions by Backend.
Backend Compiler lowers the SSA-IR level functions to the ISA-specific instruction sequences. This takes a SSA Builder as its input. This does some optimization during the course of lowering as well as the register allocations.

After going through all three components for all functions in a module, we also resolve the "relocation" of functions. In other words, we resolve local function calls as relative address "direct branches', rather than indirect jumps.

The following diagram illustrates the overview of the pipeline:

graph TD;
    SSA_Builder[[SSA Builder]]
    Frontend_Compiler(Frontend Compiler)
    Backend_compiler(Backend Compiler)
    SSA_Function(SSA Function)
    Machine_interface[(Machine Interface)]
    ISA_arm64((AArch64))
    ISA_x64((x64))

    foo.wasm-- for each function -->Frontend_Compiler;
    Frontend_Compiler-->SSA_Function;
    SSA_Function-->Backend_compiler;
    Backend_compiler-- after all functions --> Relocations
    Machine_interface-->Backend_compiler
    ISA_arm64-->Machine_interface
    ISA_x64-->Machine_interface
    SSA_Builder---Frontend_Compiler;
    SSA_Builder---SSA_Function;
    SSA_Builder---Backend_compiler;
    ssa_passes[["CFG Analysis 
SSA-level optimizations
Block Ordering"]]
    ssa_passes--->SSA_Function
    backend_passes[["Instruction Selection
Livenesss Analysis
Register Allocation
Branch Resolution
Binary Encoding"]]
    backend_passes--->Backend_compiler

SSA construction

The way we do to lower Wasm-level control flow/variables into SSA-form is based on the paper Simple and Efficient Construction of Static Single Assignment Form. We use the "block argument" variant, instead of using PHIs like MLIR

Register Allocation

The implemented register allocation logic is a very simple variant of Chaitin's algorithm. The code works on the ISA-level IR, not the SSA-level one, but it abstracts away the specifics of ISA via interfaces. It does the straightforward graph coloring algorithm using the interference graph where the interference is calculated in the CFG-level, not the linearized code.

Before the allocation algorithm runs, we does the liveness analysis following the algorithm described in the Chapter 9 of the SSA-based Compiler Design bool.

Frame Pointer usage

As a simple example of compilation result, the Wasm function which swaps two params:

(func (param i32 i32) (result i32 i32)
    local.get 1
    local.get 0
)

is compiled as

str x30, [sp, #-0x10]!
mov x0, x3
mov x1, x2
ldr x30, [sp], #0x10
ret

where you can see we only use sp, but not fp which is kinda weird compared to other compilers (note that x0 and x1 params are implementation specific and not relevant here. The actual Wasm param starts at x2/v2). Basically, FP is only
necessary when we want to integrate native debugger and profiles, which is in our case not relevant. Moreover, the typical way of saving/restoring frame pointer (at prologue/epilogue) cannot be applied here because our stack is Go-allocated byte slice therefore it "moves". As a consequence, saving the absolute address of FP is absolutely dangerous therefore we do not store it at all. Speaking of SP, the caller/callee always knows how much space needed and hence we only just need to add/sub relative amount. (Of course that can be applied to FP but it's costly and not necessary at all!).

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

codefromthecrypt · 2023-08-08T05:04:07Z

well I have to say, thank god it isn't 24k lines 😎

mathetake

notes

mathetake · 2023-08-08T06:25:04Z

internal/engine/wazevo/engine.go

+		// Lower Wasm to SSA.
+		err := fe.LowerToSSA()
+		if err != nil {
+			return fmt.Errorf("wasm->ssa: %v", err)
+		}
+
+		// Run SSA-level optimization passes.
+		ssaBuilder.RunPasses()
+
+		// Finalize the layout of SSA blocks which might use the optimization results.
+		ssaBuilder.LayoutBlocks()
+
+		// Now our ssaBuilder contains the necessary information to further lower them to
+		// machine code.
+		body, rels, goPreambleSize, err := be.Compile()


this is the outermost of the pipeline

mathetake · 2023-08-08T06:25:53Z

internal/engine/wazevo/engine.go

+	// Resolve relocations for local function calls.
+	machine.ResolveRelocations(e.refToBinaryOffset, executable, e.rels)


after compiling all local functions, we can resolve the relocations (relative offers between the call instructions and the target functions).

mathetake · 2023-08-08T06:26:58Z

internal/engine/wazevo/frontend/lower.go

+	}
+}
+
+func (c *Compiler) lowerOpcode(op wasm.Opcode) {


this is where lower Wasm instructions into SSA instructions

mathetake · 2023-08-08T06:27:48Z

internal/engine/wazevo/ssa/pass.go

+//
+// Note that passes suffixed with "Opt" are the optimization passes, meaning that they edit the instructions and blocks
+// while the other passes are not, like passEstimateBranchProbabilities does not edit them, but only calculates the additional information.
+func (b *builder) RunPasses() {


this is where IR-level optimization passes run

mathetake · 2023-08-08T06:30:08Z

internal/engine/wazevo/backend/machine.go

+
+type (
+	// Machine is a backend for a specific ISA machine.
+	Machine interface {


This is the interface implemented per ISA (of course currently only arm64 tho)

mathetake · 2023-08-08T06:30:59Z

internal/engine/wazevo/backend/regalloc/api.go

Interfaces in this file are implemented per ISA so that register allocation can run without ISA specifics

mathetake · 2023-08-08T06:34:41Z

internal/engine/wazevo/backend/isa/arm64/machine.go

+
+type (
+	// machine implements backend.Machine.
+	machine struct {


arm64 implementation of Machine interface

mathetake · 2023-08-08T06:35:31Z

internal/engine/wazevo/backend/isa/arm64/machine.go

+		//
+		//            (high address)
+		//          +-----------------+
+		//          |     .......     |
+		//          |      ret Y      |
+		//          |     .......     |
+		//          |      ret 0      |
+		//          |      arg X      |
+		//          |     .......     |
+		//          |      arg 1      |
+		//          |      arg 0      |
+		//          |      xxxxx      |
+		//          |   ReturnAddress |
+		//          +-----------------+   <<-|
+		//          |   ...........   |      |
+		//          |   spill slot M  |      | <--- spillSlotSize
+		//          |   ............  |      |
+		//          |   spill slot 2  |      |
+		//          |   spill slot 1  |   <<-+
+		//          |   clobbered N   |
+		//          |   ...........   |
+		//          |   clobbered 1   |
+		//          |   clobbered 0   |
+		//   SP---> +-----------------+
+		//             (low address)
+		//


diagram to explain the stack layout

mathetake · 2023-08-08T06:36:04Z

internal/engine/wazevo/backend/isa/arm64/abi_go_entry.go

this file implements the "trampoline" prologue executed right after being jumped from Go.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake · 2023-08-09T01:44:44Z

now that 1.4.0 has been cut, I am landing this, though no impact on the main code any way

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Jeroen Bobbeldijk <jeroen@klippa.com>

mathetake added 3 commits August 8, 2023 10:14

wazevo: initial impl of the new optimizing backend

10fbab6

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

skip

f695105

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

fuzz

1c881c3

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake marked this pull request as ready for review August 8, 2023 02:39

mathetake requested a review from codefromthecrypt as a code owner August 8, 2023 02:39

mathetake requested review from evacchi, achille-roussel and ncruces August 8, 2023 04:01

mathetake commented Aug 8, 2023

View reviewed changes

fuzz

4ae8a79

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

mathetake merged commit 0290087 into main Aug 9, 2023
59 checks passed

mathetake deleted the wazevo branch August 9, 2023 01:45

jerbob92 pushed a commit to jerbob92/wazero that referenced this pull request Aug 9, 2023

wazevo: initial impl of the new optimizing backend (tetratelabs#1615)

e809784

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Jeroen Bobbeldijk <jeroen@klippa.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wazevo: initial impl of the new optimizing backend #1615

wazevo: initial impl of the new optimizing backend #1615

mathetake commented Aug 8, 2023 •

edited

Loading

codefromthecrypt commented Aug 8, 2023

mathetake left a comment

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake Aug 8, 2023

mathetake commented Aug 9, 2023 •

edited

Loading

		// Resolve relocations for local function calls.
		machine.ResolveRelocations(e.refToBinaryOffset, executable, e.rels)

wazevo: initial impl of the new optimizing backend #1615

wazevo: initial impl of the new optimizing backend #1615

Conversation

mathetake commented Aug 8, 2023 • edited Loading

Notes

Pipeline

SSA construction

Register Allocation

Frame Pointer usage

codefromthecrypt commented Aug 8, 2023

mathetake left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathetake commented Aug 9, 2023 • edited Loading

mathetake commented Aug 8, 2023 •

edited

Loading

mathetake commented Aug 9, 2023 •

edited

Loading