-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wazevo: initial impl of the new optimizing backend #1615
Conversation
well I have to say, thank god it isn't 24k lines 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notes
// Lower Wasm to SSA. | ||
err := fe.LowerToSSA() | ||
if err != nil { | ||
return fmt.Errorf("wasm->ssa: %v", err) | ||
} | ||
|
||
// Run SSA-level optimization passes. | ||
ssaBuilder.RunPasses() | ||
|
||
// Finalize the layout of SSA blocks which might use the optimization results. | ||
ssaBuilder.LayoutBlocks() | ||
|
||
// Now our ssaBuilder contains the necessary information to further lower them to | ||
// machine code. | ||
body, rels, goPreambleSize, err := be.Compile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the outermost of the pipeline
// Resolve relocations for local function calls. | ||
machine.ResolveRelocations(e.refToBinaryOffset, executable, e.rels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after compiling all local functions, we can resolve the relocations (relative offers between the call instructions and the target functions).
} | ||
} | ||
|
||
func (c *Compiler) lowerOpcode(op wasm.Opcode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where lower Wasm instructions into SSA instructions
// | ||
// Note that passes suffixed with "Opt" are the optimization passes, meaning that they edit the instructions and blocks | ||
// while the other passes are not, like passEstimateBranchProbabilities does not edit them, but only calculates the additional information. | ||
func (b *builder) RunPasses() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where IR-level optimization passes run
|
||
type ( | ||
// Machine is a backend for a specific ISA machine. | ||
Machine interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the interface implemented per ISA (of course currently only arm64 tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interfaces in this file are implemented per ISA so that register allocation can run without ISA specifics
|
||
type ( | ||
// machine implements backend.Machine. | ||
machine struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arm64 implementation of Machine interface
// | ||
// (high address) | ||
// +-----------------+ | ||
// | ....... | | ||
// | ret Y | | ||
// | ....... | | ||
// | ret 0 | | ||
// | arg X | | ||
// | ....... | | ||
// | arg 1 | | ||
// | arg 0 | | ||
// | xxxxx | | ||
// | ReturnAddress | | ||
// +-----------------+ <<-| | ||
// | ........... | | | ||
// | spill slot M | | <--- spillSlotSize | ||
// | ............ | | | ||
// | spill slot 2 | | | ||
// | spill slot 1 | <<-+ | ||
// | clobbered N | | ||
// | ........... | | ||
// | clobbered 1 | | ||
// | clobbered 0 | | ||
// SP---> +-----------------+ | ||
// (low address) | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diagram to explain the stack layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file implements the "trampoline" prologue executed right after being jumped from Go.
now that 1.4.0 has been cut, I am landing this, though no impact on the main code any way |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com> Signed-off-by: Jeroen Bobbeldijk <jeroen@klippa.com>
This commit introduces the initial implementation of our new optimizing
backend called "wazevo" (#1496) which I have worked on in the last 2 months.
The new backend is written completely from scratch without reusing any line of the
existing compiler, assembler etc. The notable difference is that we employ the traditional
way of compilation pipeline, e.g. SSA-level IR, optimization passes, instruction selections,
register allocations, etc as opposed to the current single pass compiler. As a result,
for example, we can now use the register-based calling conventions etc which will definitely
contribute to the perf improvement. As a sneak preview of the perf improvement,
the following is the bench result of
recursive_fibonacci(30)
.As expected, this is far from completion (in fact this doesn't pass spectest at all!).
Instead, the purpose of this commit is to set up the foundation and code base that
multiple people can work on collaboratively. For now, it only has AArch64 backend,
and the plan is to make it pass all spec tests before we start introducing x64 backend.
That way we could have a better abstraction layer over ISA-specific code. In any case,
this leaves tens of TODOs intentionally, and we will iterate on this
new codebase over the next few months in the subsequent PRs.
Notes
Pipeline
Basically, we mainly have three components:
After going through all three components for all functions in a module, we also resolve the "relocation" of functions. In other words, we resolve local function calls as relative address "direct branches', rather than indirect jumps.
The following diagram illustrates the overview of the pipeline:
SSA construction
The way we do to lower Wasm-level control flow/variables into SSA-form is based on the paper Simple and Efficient Construction of Static Single Assignment Form. We use the "block argument" variant, instead of using PHIs like MLIR
Register Allocation
The implemented register allocation logic is a very simple variant of Chaitin's algorithm. The code works on the ISA-level IR, not the SSA-level one, but it abstracts away the specifics of ISA via interfaces. It does the straightforward graph coloring algorithm using the interference graph where the interference is calculated in the CFG-level, not the linearized code.
Before the allocation algorithm runs, we does the liveness analysis following the algorithm described in the Chapter 9 of the SSA-based Compiler Design bool.
Frame Pointer usage
As a simple example of compilation result, the Wasm function which swaps two params:
is compiled as
where you can see we only use
sp
, but notfp
which is kinda weird compared to other compilers (note that x0 and x1 params are implementation specific and not relevant here. The actual Wasm param starts at x2/v2). Basically, FP is onlynecessary when we want to integrate native debugger and profiles, which is in our case not relevant. Moreover, the typical way of saving/restoring frame pointer (at prologue/epilogue) cannot be applied here because our stack is Go-allocated byte slice therefore it "moves". As a consequence, saving the absolute address of FP is absolutely dangerous therefore we do not store it at all. Speaking of SP, the caller/callee always knows how much space needed and hence we only just need to add/sub relative amount. (Of course that can be applied to FP but it's costly and not necessary at all!).