diff --git a/debugging-modules/README.md b/debugging-modules/README.md new file mode 100644 index 0000000..9b09dd8 --- /dev/null +++ b/debugging-modules/README.md @@ -0,0 +1,283 @@ +# WebAssembly Debugging Modules + +This proposal defines interfaces for querying source-level debugging information +about a WebAssembly module. A WebAssembly runtime can leverage these interfaces +in its developer tools to translate from Wasm-level debugging information into a +user-friendly, source-level presentation. + +> **Note:** this is an incomplete, work-in-progress, strawperson proposal! +> Nothing here is set in stone and everything is likely to change! + +## Table of Contents + +- [Motivation](#motivation) +- [Overview](#overview) +- [Walkthrough](#walkthrough) +- [Interface Definitions](#interface-definitions) +- [FAQ](#faq) + +## Motivation + +Bugs abound, functions are slow, and developers need to figure out what's +happening and why. But developers aren't typically writing WebAssembly by hand, +they are writing in a high-level source language and compiling it down into +WebAssembly code. Even though the WebAssembly code is what is actually being +executed, developers prefer to debug and profile in terms of the source +language. This source-level experience is the norm for both native code in GDB, +LLDB, or `perf`, and for managed languages in their language-specific tooling, +like JavaScript in a Web browser's developer tools. This proposal aims unlock +the source-level debugging experience for Wasm code running inside a runtime. + +> **Note:** This proposal currently only defines APIs that are powerful enough +> to support source-level stepping in a debugger and symbolicating source-level +> locations and function names in other developer tools, such as a profiler or +> logging console. +> +> There are many possible debugging queries that this proposal does not +> currently define APIs for. It does not define APIs for inspecting scopes and +> recovering their bindings' values. It does not define APIs for expanding +> physical Wasm function frames into logical, possibly-inlined source function +> frames. We intend to support these use cases eventually, but are starting with +> just source-level location information, and can incrementally grow the +> debugging module API surface as time goes on. + +Additionally, the same way that WebAssembly itself is embedder agnostic, and +doesn't require (for example) a Web browser or JavaScript engine, so too should +debugging functionality be embedder agnostic. + +## Overview + +This proposal builds on the following existing or proposed WebAssembly concepts: + +* **Custom sections:** These are sections defined by the [core WebAssembly + spec's binary + format](https://webassembly.github.io/spec/core/binary/modules.html#custom-section) + whose contents are uninterpreted by the core spec but can be interpreted by + other tools or specifications (including this proposal) +* **Interface types:** The [WebAssembly interface + types](https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md) + proposal allows defining shared-nothing, language-neutral interfaces. + +This proposal defines the following new concepts: + +* **Debuggee module:** A debuggee module is the Wasm module that is being + debugged or profiled. +* **Debugging module:** A debugging module is referenced from its debuggee Wasm + module, and is a separate Wasm module that translates between Wasm-level + debugging information and source-level debuggging information. A debugging + module is used by the engine's developer tools. +* **`WasmDebugger`:** The `WasmDebugger` interface provides raw, Wasm-level + debugging APIs for inspecting the debuggee Wasm module. It is implemented by + the Wasm engine and given to a debugging module's `SourceDebugger` interface. +* **`SourceDebugger`:** The `SourceDebugger` interface provides source-level + debugging APIs for inspecting the debuggee Wasm module. It is implemented by + the debugging module, translates between source-level information and + Wasm-level information, and wraps a `WasmDebugger` instance. +* **Wasm breakpoint:** A breakpoint associated with an instruction in the + debuggee module's code section. +* **Source breakpoint:** A breakpoint associated with a location in the source + text, and may map to multiple Wasm breakpoints. + +Collectively, these concepts and their relationships can be visualized together +in the following diagram: + +![](debugging-modules.png) + +## Walkthrough + +This section gives high-level walkthroughs of a few representative scenarios +where debugging modules are used by developer tools to provide source-level +information. + +### Symbolicating Profiler Stacks + +A sampling profiler will periodically pause Wasm execution, record its current +stack, and then resume execution. Each frame in a recorded stack contains the +Wasm module, function, and code offset. To display the profiler's results in a +source-level presentation, the developer tools would perform the following steps +for each frame of each stack: + +* Get or instantiate the debugger module `SourceDebugger` for the frame's Wasm + module. +* Let `sourceRanges` be the result of calling `SourceDebugger.getSourceRanges`. +* If `sourceRanges` is empty, leave the frame unsymbolicated. +* Otherwise, let the frame's source-level symbolication be the first entry of + `sourceRanges`. + +> **Note:** This algorithm and usage of `SourceDebugger.getSourceRanges` enables +> offline stack symbolication. It would not support symbolicating interpreted +> code stacks like C# stacks within Blazor. If we wanted to support +> symbolicating interpreted code stacks, it would likely require calling into +> the debuggee module or inspecting its state online during stack sampling. In +> turn, that would require only sampling at safe points of execution (similar to +> GC safe points) to avoid making intermediate states or optimizations +> observable. This can get a bit hairy, and whether we want to support this or +> not is an open question. + +### Listing Sources and Displaying Source Text + +Debugger GUIs often display a list of source files or a file and directory tree +of source files. To construct the complete list of source files for Wasm modules +in a runtime, perform the following steps: + +* For each debuggee module in the runtime: + * Get or create the debugging module `SourceDebugger` for the debuggee + module. + * Append the result of calling `SourceDebugger.listSources` to the source + list. + +When a user selects a source from the list of all source files, a debugger GUI +will typically display the selected source's text in a tab or panel. The text +for a source can be retrieved by calling `SourceDebugger.getSourceText`. + +### Setting a Breakpoint + +When a user asks the debugger to set a source-level breakpoint, the debugger +should perform the following steps: + +* Get the `SourceDebugger` associated with the breakpoint's source. +* Let `options` be the `SourceBreakpointOptions` describing the requested + breakpoint. +* Let `id` be the result of calling `SourceDebugger.setBreakpoint` with + `options`. + * If there is no expression at the breakpoint's location: + * The `SourceDebugger.setBreakpoint` implementation should return null. + * Otherwise: + * The `SourceDebugger` implementation should set as many Wasm + breakpoints as necessary to pause just before evaluation of the + breakpoint location's expression begins using + `WasmDebugger.setBreakpoint`. +* If `id` is null: + * Then no breakpoint was set (e.g because the requested location is within a + comment) and the UI should reflect this. +* Otherwise: + * Save the `id` for if/when the breakpoint gets hit or the user wants to + clear the breakpoint. + +### Hitting a Breakpoint + +* Let `dbg` be the `SourceDebugger` instance for the current frame. +* Construct the `WasmBreakInfo` for the Wasm breakpoint that was hit. +* Let `breakResult` be the result of calling `dbg.onBreak` with the + `WasmBreakInfo` +* If `breakResult.kind` is `"Continue"`: + * Resume the debuggee module's execution. +* Otherwise if `breakResult.kind` is `"Pause"`: + * Assert that `breakResult.location` is not null. + * Update the debugger UI to show that execution is paused at the source + location `breakResult.location`. + +### Stepping + +* Let `dbg` be the `SourceDebugger` instance for the current frame. +* Construct the `SourceStepOptions` for the step that the user requested. +* Call `dbg.onStep` with the `SourceStepOptions` + * The `onStep` implementation should use the `WasmDebugger` to set Wasm + breakpoint(s) where it determines execution should pause after taking the + requested step. +* Resume debuggee execution. + +Pausing after the step has been executed is equivalent to hitting a breakpoint, +as described above, although additionally `SourceDebugger` implementations +should delete any Wasm breakpoints that were created for the step with +`WasmDebugger.clearBreakpoint`. + +## Interface Definitions + +> **Note:** In order to be embedder agnostic — about whether it is a Web browser +> or JavaScript engine or not — we intend to eventually define these interfaces +> with [WebAssembly Interface +> Types](https://github.com/WebAssembly/interface-types/). However, since that +> standard is still coming together, we are temporarily describing the +> interfaces with Web IDL. + +The definitions for interfaces defined in this proposal are available in +[`debugging-modules.webidl`](./debugging-modules.webidl). + +## FAQ + +### Why not an existing debug info format like DWARF or PDB? + +There are a variety of reasons: + +* Future extensibility is harder with data formats than interfaces. With + interfaces, we can always add new, optional methods that debugging modules can + implement and use or not. With data formats, we need to design future + extensibility into the format, which requires more care and can also bloat its + encoded size with repetitive metadata and field tags. + +* Interpreting DWARF and PDB data is not straight forward, and the full + responsibility for that would lie completely on developer tools authors. In + contrast, debugging modules provide high-level methods for querying program + information, which are easier to work with for developer tools + authors. Toolchains can wrap DWARF or PDB interpreters in a debugging module + interface, and this debugging module implementation can be shared across any + toolchains that use that same format. Sharing debugging module implementations + is easier for toolchains than sharing DWARF/PDB interpretation logic is for + developer tools, since debugging modules talk to a `WasmDebugger` interface + and produce a well-known `SourceDebugger` interface that are used by all + toolchains, while developer tools would be calling into internal APIs that + aren't common across developer tools implementations. + +* Both debug info formats and standards ossify and evolve slowly. With debugging + modules, we allow interface implementations to experiment and iterate + separately from the standard. This allows innovations (for example more + compact data representations) to develop and ship without needing to go + through the slow standards process. + +* Static debug info formats don't provide a path forward for source-level + debugging of projects like Blazor, that run interpreted code inside their + Wasm. Debugging that interpreted code requires dynamic debugging APIs. + +### Why not a protocol instead of Wasm interface types? + +A wire protocol requires defining the same set of operations we want to support +that we define as interface methods in this proposal, and *also* a serialization +format. Defining a serialization format that is both compact and +future-extensible is no small task. Additionally, nothing about source-level +debugging *requires* over-the-wire communication or message passing, even if +that is often a good architectural decision. Implementations are free to proxy +this proposal's interface method calls across a protocol or to another +process. It doesn't make sense to bake a specific wire protocol into the +standard, when it can be left as an implementation detail. + +One might might be tempted to use a protocol to avoid an inter-standards +dependency on Wasm interface types. A protocol requires passing the serialized +data into and out of the debugging module. Passing that data in or out requires +knowledge of calling conventions and memory ownership (who mallocs and who +frees). This is a problem that Wasm interface types are already standardizing a +solution for, and which engines already intend to support. Duplicating standards +work done by another subgroup is far from ideal: it leads to more implementation +work for both toolchains and engines. + +The final thing to consider is the code size impact that using a protocol +implies. Incoming messages must be deserialized and outgoing messages must be +serialized, and both those things require non-trivial amounts of code. On the +other hand, with Wasm interface types most of the functionality is implemented +once in the Wasm engine, and doesn't bloat every module's code size. + +### Can debugging modules run outside of the debuggee process? + +Yes! Engines are free to run debugging modules in their own process and proxy +calls into `SourceDebugger`, or from `SourceDebugger` to `WasmDebugger`, across +an IPC channel (or even over a wire protocol). However, this is an +implementation detail for the Wasm engine and embedder, and not something that +this proposal needs to standardize. + +### What about Wasm that is ahead-of-time compiled into a native library or executable? + +This proposal does *not* intend to solve source-level debugging of a native +library or executable that is derived from Wasm that is in turn derived from +some high-level source language. That would most likely be best served by +providing the debugging information in the format that is usually expected for +native code on that platform (e.g. DWARF or PDB) so that all the usual tools for +debugging and profiling native code continue to Just Work without modification. + +Theoretically, this proposal could be leveraged in a Source->Wasm->Native +compilation pipeline for translating source-level information into Wasm-level +information, which the compiler then lowers into DWARF or PDB. Even though we +would not ultimately end up with a single debug format for all targets and all +situations, it is expected that a rising tide will lift all boats: some +debugging modules will undoubtedly use DWARF or PDB internally and having shared +tools and conventions for working with Wasm and those formats will help both the +AOT and runtime use cases. diff --git a/debugging-modules/debugging-modules.png b/debugging-modules/debugging-modules.png new file mode 100644 index 0000000..f489a54 Binary files /dev/null and b/debugging-modules/debugging-modules.png differ diff --git a/debugging-modules/debugging-modules.webidl b/debugging-modules/debugging-modules.webidl new file mode 100644 index 0000000..ddef9c4 --- /dev/null +++ b/debugging-modules/debugging-modules.webidl @@ -0,0 +1,224 @@ +typedef unsigned long WasmBreakpointId; +typedef unsigned long WasmCodeOffset; +typedef unsigned long SourceBreakpointId; +typedef unsigned long SourceLine; +typedef unsigned long SourceColumn; +typedef DOMString SourceFilename; +typedef DOMString SourceFunctionName; + +/** + * The raw debugging APIs for a Wasm module. + * + * Implementations of this interface are provided by the Wasm engine, and called + * by user-provided implementations of `SourceDebugger`. + */ +interface WasmDebugger { + /** + * Set a breakpoint at the instruction at the given offset into the Code + * section. + * + * Execution should be paused when reaching this offset, but before + * evaluating its associated instruction. + */ + WasmBreakpointId? setBreakpoint(WasmCodeOffset offset); + + /** + * Clear a breakpoint that was previously set. + */ + void clearBreakpoint(WasmBreakpointId breakpoint); + + /** + * Get the contents of a custom section, if it exists in this module. + * + * `SourceDebugger` implementations may use this method to read encoded line + * tables or other debug information from the Wasm module. + */ + ArrayBuffer? getCustomSection(USVString customSectionName); +}; + +/** + * Options provided by the debugger when requesting that a `SourceDebugger` set + * a breakpoint. + */ +dictionary SourceBreakpointOptions { + required SourceFilename filename; + required SourceLine line; + SourceLine column; +}; + +/** + * Information about a Wasm breakpoint that has been hit. + */ +dictionary WasmBreakInfo { + required WasmBreakpointId breakpoint; +}; + +/** + * The kind of source-level action to take after hitting a Wasm breakpoint. + */ +enum SourceBreakResultKind { + /** + * Continue execution and ignore the breakpoint. + */ + "Continue", + /** + * Pause execution because we hit a source-level breakpoint or did a step. + */ + "Pause", +}; + +/** + * How a Wasm breakpoint in the debuggee should be handled. + */ +dictionary SourceBreakResult { + /** + * The kind of action that should be taken: should the debuggee pause at a + * source-level breakpoint or continue execution? + */ + required SourceBreakResultKind kind; + + /** + * If the `kind` is `"Pause"`, this is the source-level breakpoint that the + * debuggee is paused at, if we hit a breakpoint. + */ + SourceBreakpointId breakpoint; + + /** + * If the `kind` is `"Pause"`, this is the source location where execution + * is paused at. + */ + SourceLocation location; +}; + +/** + * What kind of step should be taken. + */ +enum SourceStepKind { + /** + * If currently paused before a function call, continue to the first + * expression of that function. Otherwise, continue execution until the next + * line of code. + */ + "Into", + /** + * Continue execution until the next line of code. + */ + "Over", + /** + * Continue execution until the function returns. + */ + "Out", +}; + +dictionary SourceStepOptions { + required SourceStepKind kind; +}; + +/** + * A range within a source file. + * + * Inclusive of `startLine:startColumn`, and exclusive of `endLine:endColumn`. + */ +dictionary SourceRange { + required SourceFilename filename; + SourceFunctionName function; + + required SourceLine startLine; + required SourceColumn startColumn; + + required SourceLine endLine; + required SourceColumn endColumn; +}; + +/** + * A location inside some source text. + */ +dictionary SourceLocation { + required SourceFilename filename; + required SourceLine line; + SourceColumn column; +}; + +/** + * A range of instructions within a Wasm module's Code section. + * + * Inclusive of `start`, and exclusive of `end`. + */ +dictionary WasmCodeRange { + required WasmCodeOffset start; + required WasmCodeOffset end; +}; + +/** + * Source-level debugging APIs for a Wasm module. + * + * Debugging modules provide an implementation of this interface, wrapping and + * translating the given `WasmDebugger`'s raw Wasm-level debugging APIs into + * source-level debugging APIs which are used by various developer tools + * (debuggers, profilers, etc). + */ +[Constructor(WasmDebugger)] +interface SourceDebugger { + /** + * Set a source-level breakpoint at the given source position. + * + * Execution should be paused when reaching this location, but before + * evaluating its associated code. + */ + SourceBreakpointId? setBreakpoint(SourceBreakpointOptions options); + + /** + * Clear a source-level breakpoint that was previously set. + */ + void clearBreakpoint(SourceBreakpointId breakpoint); + + /** + * This method will be called whenever the debuggee hits a Wasm + * breakpoint. This allows the `SourceDebugger` to translate Wasm-level + * breaks into source-level breaks. + */ + SourceBreakResult onBreak(WasmBreakInfo); + + /** + * This method is called when the debuggee module is paused to step into, + * over, or out of a function. + * + * `SourceDebugger` implementations should set Wasm breakpoint(s) where + * execution should pause after taking the requested step. + */ + void onStep(SourceStepOptions options); + + /** + * Get the source range(s) for the given code offset, if any. + * + * Note that various code transformations (e.g. common subexpression + * elinimation) may result in an offset mapping to multiple source ranges. + * + * Note that an offset may also not map to any source range, for example if + * debug information is unavailable for a linked object file that + * contributed to this Wasm module's compilation. + */ + Sequence getSourceRanges(WasmCodeOffset offset); + + /** + * Get the code offset(s) for the given source location, if any. + * + * Note that a source location may map to multiple code ranges, for example + * if one of a location's subexpressions has been hoisted out of a loop. + * + * Note that a source location might not map to any code range. A trivial + * example is a source location within a comment in the source text. Another + * example is an expression that was dead code eliminated. + */ + Sequence getWasmCodeRanges(SourceLocation location); + + /** + * Get a list of the source files that this Wasm module was compiled from. + */ + Sequence listSources(); + + /** + * Get the source text for the given source file. + */ + SourceText? getSourceText(SourceFilename source); +};