This proposal defines interfaces for querying source-level debugging information about a WebAssembly module. A WebAssembly runtime can leverage these interfaces in its developer tools to translate from Wasm-level debugging information into a user-friendly, source-level presentation.
Note: this is an incomplete, work-in-progress, strawperson proposal! Nothing here is set in stone and everything is likely to change!
Bugs abound, functions are slow, and developers need to figure out what's
happening and why. But developers aren't typically writing WebAssembly by hand,
they are writing in a high-level source language and compiling it down into
WebAssembly code. Even though the WebAssembly code is what is actually being
executed, developers prefer to debug and profile in terms of the source
language. This source-level experience is the norm for both native code in GDB,
LLDB, or perf
, and for managed languages in their language-specific tooling,
like JavaScript in a Web browser's developer tools. This proposal aims unlock
the source-level debugging experience for Wasm code running inside a runtime.
Note: This proposal currently only defines APIs that are powerful enough to support source-level stepping in a debugger and symbolicating source-level locations and function names in other developer tools, such as a profiler or logging console.
There are many possible debugging queries that this proposal does not currently define APIs for. It does not define APIs for inspecting scopes and recovering their bindings' values. It does not define APIs for expanding physical Wasm function frames into logical, possibly-inlined source function frames. We intend to support these use cases eventually, but are starting with just source-level location information, and can incrementally grow the debugging module API surface as time goes on.
Additionally, the same way that WebAssembly itself is embedder agnostic, and doesn't require (for example) a Web browser or JavaScript engine, so too should debugging functionality be embedder agnostic.
This proposal builds on the following existing or proposed WebAssembly concepts:
- Custom sections: These are sections defined by the core WebAssembly spec's binary format whose contents are uninterpreted by the core spec but can be interpreted by other tools or specifications (including this proposal)
- Interface types: The WebAssembly interface types proposal allows defining shared-nothing, language-neutral interfaces.
This proposal defines the following new concepts:
- Debuggee module: A debuggee module is the Wasm module that is being debugged or profiled.
- Debugging module: A debugging module is referenced from its debuggee Wasm module, and is a separate Wasm module that translates between Wasm-level debugging information and source-level debuggging information. A debugging module is used by the engine's developer tools.
WasmDebugger
: TheWasmDebugger
interface provides raw, Wasm-level debugging APIs for inspecting the debuggee Wasm module. It is implemented by the Wasm engine and given to a debugging module'sSourceDebugger
interface.SourceDebugger
: TheSourceDebugger
interface provides source-level debugging APIs for inspecting the debuggee Wasm module. It is implemented by the debugging module, translates between source-level information and Wasm-level information, and wraps aWasmDebugger
instance.- Wasm breakpoint: A breakpoint associated with an instruction in the debuggee module's code section.
- Source breakpoint: A breakpoint associated with a location in the source text, and may map to multiple Wasm breakpoints.
Collectively, these concepts and their relationships can be visualized together in the following diagram:
This section gives high-level walkthroughs of a few representative scenarios where debugging modules are used by developer tools to provide source-level information.
A sampling profiler will periodically pause Wasm execution, record its current stack, and then resume execution. Each frame in a recorded stack contains the Wasm module, function, and code offset. To display the profiler's results in a source-level presentation, the developer tools would perform the following steps for each frame of each stack:
- Get or instantiate the debugger module
SourceDebugger
for the frame's Wasm module. - Let
sourceRanges
be the result of callingSourceDebugger.getSourceRanges
. - If
sourceRanges
is empty, leave the frame unsymbolicated. - Otherwise, let the frame's source-level symbolication be the first entry of
sourceRanges
.
Note: This algorithm and usage of
SourceDebugger.getSourceRanges
enables offline stack symbolication. It would not support symbolicating interpreted code stacks like C# stacks within Blazor. If we wanted to support symbolicating interpreted code stacks, it would likely require calling into the debuggee module or inspecting its state online during stack sampling. In turn, that would require only sampling at safe points of execution (similar to GC safe points) to avoid making intermediate states or optimizations observable. This can get a bit hairy, and whether we want to support this or not is an open question.
Debugger GUIs often display a list of source files or a file and directory tree of source files. To construct the complete list of source files for Wasm modules in a runtime, perform the following steps:
- For each debuggee module in the runtime:
- Get or create the debugging module
SourceDebugger
for the debuggee module. - Append the result of calling
SourceDebugger.listSources
to the source list.
- Get or create the debugging module
When a user selects a source from the list of all source files, a debugger GUI
will typically display the selected source's text in a tab or panel. The text
for a source can be retrieved by calling SourceDebugger.getSourceText
.
When a user asks the debugger to set a source-level breakpoint, the debugger should perform the following steps:
- Get the
SourceDebugger
associated with the breakpoint's source. - Let
options
be theSourceBreakpointOptions
describing the requested breakpoint. - Let
id
be the result of callingSourceDebugger.setBreakpoint
withoptions
.- If there is no expression at the breakpoint's location:
- The
SourceDebugger.setBreakpoint
implementation should return null.
- The
- Otherwise:
- The
SourceDebugger
implementation should set as many Wasm breakpoints as necessary to pause just before evaluation of the breakpoint location's expression begins usingWasmDebugger.setBreakpoint
.
- The
- If there is no expression at the breakpoint's location:
- If
id
is null:- Then no breakpoint was set (e.g because the requested location is within a comment) and the UI should reflect this.
- Otherwise:
- Save the
id
for if/when the breakpoint gets hit or the user wants to clear the breakpoint.
- Save the
- Let
dbg
be theSourceDebugger
instance for the current frame. - Construct the
WasmBreakInfo
for the Wasm breakpoint that was hit. - Let
breakResult
be the result of callingdbg.onBreak
with theWasmBreakInfo
- If
breakResult.kind
is"Continue"
:- Resume the debuggee module's execution.
- Otherwise if
breakResult.kind
is"Pause"
:- Assert that
breakResult.location
is not null. - Update the debugger UI to show that execution is paused at the source
location
breakResult.location
.
- Assert that
- Let
dbg
be theSourceDebugger
instance for the current frame. - Construct the
SourceStepOptions
for the step that the user requested. - Call
dbg.onStep
with theSourceStepOptions
- The
onStep
implementation should use theWasmDebugger
to set Wasm breakpoint(s) where it determines execution should pause after taking the requested step.
- The
- Resume debuggee execution.
Pausing after the step has been executed is equivalent to hitting a breakpoint,
as described above, although additionally SourceDebugger
implementations
should delete any Wasm breakpoints that were created for the step with
WasmDebugger.clearBreakpoint
.
Note: In order to be embedder agnostic — about whether it is a Web browser or JavaScript engine or not — we intend to eventually define these interfaces with WebAssembly Interface Types. However, since that standard is still coming together, we are temporarily describing the interfaces with Web IDL.
The definitions for interfaces defined in this proposal are available in
debugging-modules.webidl
.
There are a variety of reasons:
-
Future extensibility is harder with data formats than interfaces. With interfaces, we can always add new, optional methods that debugging modules can implement and use or not. With data formats, we need to design future extensibility into the format, which requires more care and can also bloat its encoded size with repetitive metadata and field tags.
-
Interpreting DWARF and PDB data is not straight forward, and the full responsibility for that would lie completely on developer tools authors. In contrast, debugging modules provide high-level methods for querying program information, which are easier to work with for developer tools authors. Toolchains can wrap DWARF or PDB interpreters in a debugging module interface, and this debugging module implementation can be shared across any toolchains that use that same format. Sharing debugging module implementations is easier for toolchains than sharing DWARF/PDB interpretation logic is for developer tools, since debugging modules talk to a
WasmDebugger
interface and produce a well-knownSourceDebugger
interface that are used by all toolchains, while developer tools would be calling into internal APIs that aren't common across developer tools implementations. -
Both debug info formats and standards ossify and evolve slowly. With debugging modules, we allow interface implementations to experiment and iterate separately from the standard. This allows innovations (for example more compact data representations) to develop and ship without needing to go through the slow standards process.
-
Static debug info formats don't provide a path forward for source-level debugging of projects like Blazor, that run interpreted code inside their Wasm. Debugging that interpreted code requires dynamic debugging APIs.
A wire protocol requires defining the same set of operations we want to support that we define as interface methods in this proposal, and also a serialization format. Defining a serialization format that is both compact and future-extensible is no small task. Additionally, nothing about source-level debugging requires over-the-wire communication or message passing, even if that is often a good architectural decision. Implementations are free to proxy this proposal's interface method calls across a protocol or to another process. It doesn't make sense to bake a specific wire protocol into the standard, when it can be left as an implementation detail.
One might might be tempted to use a protocol to avoid an inter-standards dependency on Wasm interface types. A protocol requires passing the serialized data into and out of the debugging module. Passing that data in or out requires knowledge of calling conventions and memory ownership (who mallocs and who frees). This is a problem that Wasm interface types are already standardizing a solution for, and which engines already intend to support. Duplicating standards work done by another subgroup is far from ideal: it leads to more implementation work for both toolchains and engines.
The final thing to consider is the code size impact that using a protocol implies. Incoming messages must be deserialized and outgoing messages must be serialized, and both those things require non-trivial amounts of code. On the other hand, with Wasm interface types most of the functionality is implemented once in the Wasm engine, and doesn't bloat every module's code size.
Yes! Engines are free to run debugging modules in their own process and proxy
calls into SourceDebugger
, or from SourceDebugger
to WasmDebugger
, across
an IPC channel (or even over a wire protocol). However, this is an
implementation detail for the Wasm engine and embedder, and not something that
this proposal needs to standardize.
This proposal does not intend to solve source-level debugging of a native library or executable that is derived from Wasm that is in turn derived from some high-level source language. That would most likely be best served by providing the debugging information in the format that is usually expected for native code on that platform (e.g. DWARF or PDB) so that all the usual tools for debugging and profiling native code continue to Just Work without modification.
Theoretically, this proposal could be leveraged in a Source->Wasm->Native compilation pipeline for translating source-level information into Wasm-level information, which the compiler then lowers into DWARF or PDB. Even though we would not ultimately end up with a single debug format for all targets and all situations, it is expected that a rising tide will lift all boats: some debugging modules will undoubtedly use DWARF or PDB internally and having shared tools and conventions for working with Wasm and those formats will help both the AOT and runtime use cases.