-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial strawperson proposal for debugging modules #6
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
# WebAssembly Debugging Modules | ||
|
||
This proposal defines interfaces for querying source-level debugging information | ||
about a WebAssembly module. A WebAssembly runtime can leverage these interfaces | ||
in its developer tools to translate from Wasm-level debugging information into a | ||
user-friendly, source-level presentation. | ||
|
||
> **Note:** this is an incomplete, work-in-progress, strawperson proposal! | ||
> Nothing here is set in stone and everything is likely to change! | ||
|
||
## Table of Contents | ||
|
||
- [Motivation](#motivation) | ||
- [Overview](#overview) | ||
- [Walkthrough](#walkthrough) | ||
- [Interface Definitions](#interface-definitions) | ||
- [FAQ](#faq) | ||
|
||
## Motivation | ||
|
||
Bugs abound, functions are slow, and developers need to figure out what's | ||
happening and why. But developers aren't typically writing WebAssembly by hand, | ||
they are writing in a high-level source language and compiling it down into | ||
WebAssembly code. Even though the WebAssembly code is what is actually being | ||
executed, developers prefer to debug and profile in terms of the source | ||
language. This source-level experience is the norm for both native code in GDB, | ||
LLDB, or `perf`, and for managed languages in their language-specific tooling, | ||
like JavaScript in a Web browser's developer tools. This proposal aims unlock | ||
the source-level debugging experience for Wasm code running inside a runtime. | ||
|
||
> **Note:** This proposal currently only defines APIs that are powerful enough | ||
> to support source-level stepping in a debugger and symbolicating source-level | ||
> locations and function names in other developer tools, such as a profiler or | ||
> logging console. | ||
> | ||
> There are many possible debugging queries that this proposal does not | ||
> currently define APIs for. It does not define APIs for inspecting scopes and | ||
> recovering their bindings' values. It does not define APIs for expanding | ||
> physical Wasm function frames into logical, possibly-inlined source function | ||
> frames. We intend to support these use cases eventually, but are starting with | ||
> just source-level location information, and can incrementally grow the | ||
> debugging module API surface as time goes on. | ||
|
||
Additionally, the same way that WebAssembly itself is embedder agnostic, and | ||
doesn't require (for example) a Web browser or JavaScript engine, so too should | ||
debugging functionality be embedder agnostic. | ||
|
||
## Overview | ||
|
||
This proposal builds on the following existing or proposed WebAssembly concepts: | ||
|
||
* **Custom sections:** These are sections defined by the [core WebAssembly | ||
spec's binary | ||
format](https://webassembly.github.io/spec/core/binary/modules.html#custom-section) | ||
whose contents are uninterpreted by the core spec but can be interpreted by | ||
other tools or specifications (including this proposal) | ||
* **Interface types:** The [WebAssembly interface | ||
types](https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md) | ||
proposal allows defining shared-nothing, language-neutral interfaces. | ||
|
||
This proposal defines the following new concepts: | ||
|
||
* **Debuggee module:** A debuggee module is the Wasm module that is being | ||
debugged or profiled. | ||
* **Debugging module:** A debugging module is referenced from its debuggee Wasm | ||
module, and is a separate Wasm module that translates between Wasm-level | ||
debugging information and source-level debuggging information. A debugging | ||
module is used by the engine's developer tools. | ||
* **`WasmDebugger`:** The `WasmDebugger` interface provides raw, Wasm-level | ||
debugging APIs for inspecting the debuggee Wasm module. It is implemented by | ||
the Wasm engine and given to a debugging module's `SourceDebugger` interface. | ||
* **`SourceDebugger`:** The `SourceDebugger` interface provides source-level | ||
debugging APIs for inspecting the debuggee Wasm module. It is implemented by | ||
the debugging module, translates between source-level information and | ||
Wasm-level information, and wraps a `WasmDebugger` instance. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The term There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I suppose it is wrapping both the Do you have any suggestions on wording or clarifications we can make? |
||
* **Wasm breakpoint:** A breakpoint associated with an instruction in the | ||
debuggee module's code section. | ||
* **Source breakpoint:** A breakpoint associated with a location in the source | ||
text, and may map to multiple Wasm breakpoints. | ||
|
||
Collectively, these concepts and their relationships can be visualized together | ||
in the following diagram: | ||
|
||
![](debugging-modules.png) | ||
|
||
## Walkthrough | ||
|
||
This section gives high-level walkthroughs of a few representative scenarios | ||
where debugging modules are used by developer tools to provide source-level | ||
information. | ||
|
||
### Symbolicating Profiler Stacks | ||
|
||
A sampling profiler will periodically pause Wasm execution, record its current | ||
stack, and then resume execution. Each frame in a recorded stack contains the | ||
Wasm module, function, and code offset. To display the profiler's results in a | ||
source-level presentation, the developer tools would perform the following steps | ||
for each frame of each stack: | ||
|
||
* Get or instantiate the debugger module `SourceDebugger` for the frame's Wasm | ||
module. | ||
* Let `sourceRanges` be the result of calling `SourceDebugger.getSourceRanges`. | ||
* If `sourceRanges` is empty, leave the frame unsymbolicated. | ||
* Otherwise, let the frame's source-level symbolication be the first entry of | ||
`sourceRanges`. | ||
|
||
> **Note:** This algorithm and usage of `SourceDebugger.getSourceRanges` enables | ||
> offline stack symbolication. It would not support symbolicating interpreted | ||
> code stacks like C# stacks within Blazor. If we wanted to support | ||
> symbolicating interpreted code stacks, it would likely require calling into | ||
> the debuggee module or inspecting its state online during stack sampling. In | ||
> turn, that would require only sampling at safe points of execution (similar to | ||
> GC safe points) to avoid making intermediate states or optimizations | ||
> observable. This can get a bit hairy, and whether we want to support this or | ||
> not is an open question. | ||
|
||
### Listing Sources and Displaying Source Text | ||
|
||
Debugger GUIs often display a list of source files or a file and directory tree | ||
of source files. To construct the complete list of source files for Wasm modules | ||
in a runtime, perform the following steps: | ||
|
||
* For each debuggee module in the runtime: | ||
* Get or create the debugging module `SourceDebugger` for the debuggee | ||
module. | ||
* Append the result of calling `SourceDebugger.listSources` to the source | ||
list. | ||
|
||
When a user selects a source from the list of all source files, a debugger GUI | ||
will typically display the selected source's text in a tab or panel. The text | ||
for a source can be retrieved by calling `SourceDebugger.getSourceText`. | ||
|
||
### Setting a Breakpoint | ||
|
||
When a user asks the debugger to set a source-level breakpoint, the debugger | ||
should perform the following steps: | ||
|
||
* Get the `SourceDebugger` associated with the breakpoint's source. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd assume this should be a collection of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point! This should work out since even if we end up setting many breakpoints in the "same" source that end up in different debuggee modules, only one Wasm breakpoint is ever hit at a time. I'll make an update in a little bit. |
||
* Let `options` be the `SourceBreakpointOptions` describing the requested | ||
breakpoint. | ||
* Let `id` be the result of calling `SourceDebugger.setBreakpoint` with | ||
`options`. | ||
* If there is no expression at the breakpoint's location: | ||
* The `SourceDebugger.setBreakpoint` implementation should return null. | ||
* Otherwise: | ||
* The `SourceDebugger` implementation should set as many Wasm | ||
breakpoints as necessary to pause just before evaluation of the | ||
breakpoint location's expression begins using | ||
`WasmDebugger.setBreakpoint`. | ||
* If `id` is null: | ||
* Then no breakpoint was set (e.g because the requested location is within a | ||
comment) and the UI should reflect this. | ||
* Otherwise: | ||
* Save the `id` for if/when the breakpoint gets hit or the user wants to | ||
clear the breakpoint. | ||
|
||
### Hitting a Breakpoint | ||
|
||
* Let `dbg` be the `SourceDebugger` instance for the current frame. | ||
* Construct the `WasmBreakInfo` for the Wasm breakpoint that was hit. | ||
* Let `breakResult` be the result of calling `dbg.onBreak` with the | ||
`WasmBreakInfo` | ||
* If `breakResult.kind` is `"Continue"`: | ||
* Resume the debuggee module's execution. | ||
* Otherwise if `breakResult.kind` is `"Pause"`: | ||
* Assert that `breakResult.location` is not null. | ||
* Update the debugger UI to show that execution is paused at the source | ||
location `breakResult.location`. | ||
|
||
### Stepping | ||
|
||
* Let `dbg` be the `SourceDebugger` instance for the current frame. | ||
* Construct the `SourceStepOptions` for the step that the user requested. | ||
* Call `dbg.onStep` with the `SourceStepOptions` | ||
* The `onStep` implementation should use the `WasmDebugger` to set Wasm | ||
breakpoint(s) where it determines execution should pause after taking the | ||
requested step. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not possible to implement "step-over" just by setting breakpoints, because the debugger does not know what is the next source line that will be executed. It might not be the one directly succeeding the current line, as we could be in a loop, or some conditional construct. |
||
* Resume debuggee execution. | ||
|
||
Pausing after the step has been executed is equivalent to hitting a breakpoint, | ||
as described above, although additionally `SourceDebugger` implementations | ||
should delete any Wasm breakpoints that were created for the step with | ||
`WasmDebugger.clearBreakpoint`. | ||
|
||
## Interface Definitions | ||
|
||
> **Note:** In order to be embedder agnostic — about whether it is a Web browser | ||
> or JavaScript engine or not — we intend to eventually define these interfaces | ||
> with [WebAssembly Interface | ||
> Types](https://github.com/WebAssembly/interface-types/). However, since that | ||
> standard is still coming together, we are temporarily describing the | ||
> interfaces with Web IDL. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be more clearly noted earlier in the document that this is not meant to be specific to an embedding that uses JS, as that is the impression I got from this and earlier documents until I saw this. |
||
|
||
The definitions for interfaces defined in this proposal are available in | ||
[`debugging-modules.webidl`](./debugging-modules.webidl). | ||
|
||
## FAQ | ||
|
||
### Why not an existing debug info format like DWARF or PDB? | ||
|
||
There are a variety of reasons: | ||
|
||
* Future extensibility is harder with data formats than interfaces. With | ||
interfaces, we can always add new, optional methods that debugging modules can | ||
implement and use or not. With data formats, we need to design future | ||
extensibility into the format, which requires more care and can also bloat its | ||
encoded size with repetitive metadata and field tags. | ||
|
||
* Interpreting DWARF and PDB data is not straight forward, and the full | ||
responsibility for that would lie completely on developer tools authors. In | ||
contrast, debugging modules provide high-level methods for querying program | ||
information, which are easier to work with for developer tools | ||
authors. Toolchains can wrap DWARF or PDB interpreters in a debugging module | ||
interface, and this debugging module implementation can be shared across any | ||
toolchains that use that same format. Sharing debugging module implementations | ||
is easier for toolchains than sharing DWARF/PDB interpretation logic is for | ||
developer tools, since debugging modules talk to a `WasmDebugger` interface | ||
and produce a well-known `SourceDebugger` interface that are used by all | ||
toolchains, while developer tools would be calling into internal APIs that | ||
aren't common across developer tools implementations. | ||
|
||
* Both debug info formats and standards ossify and evolve slowly. With debugging | ||
modules, we allow interface implementations to experiment and iterate | ||
separately from the standard. This allows innovations (for example more | ||
compact data representations) to develop and ship without needing to go | ||
through the slow standards process. | ||
|
||
* Static debug info formats don't provide a path forward for source-level | ||
debugging of projects like Blazor, that run interpreted code inside their | ||
Wasm. Debugging that interpreted code requires dynamic debugging APIs. | ||
|
||
### Why not a protocol instead of Wasm interface types? | ||
|
||
A wire protocol requires defining the same set of operations we want to support | ||
that we define as interface methods in this proposal, and *also* a serialization | ||
format. Defining a serialization format that is both compact and | ||
future-extensible is no small task. Additionally, nothing about source-level | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sure you could use an existing serialization format (Protobuf, FlatBuffers, Cap'n Proto..) that is already future extensible? |
||
debugging *requires* over-the-wire communication or message passing, even if | ||
that is often a good architectural decision. Implementations are free to proxy | ||
this proposal's interface method calls across a protocol or to another | ||
process. It doesn't make sense to bake a specific wire protocol into the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that implementations will always need to proxy the interface method calls across a protocol, because the debugger and debuggee will always be in different processes. |
||
standard, when it can be left as an implementation detail. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems entirely a question of perspective. I could say, "why couple this tightly to a specific API that needs to be explicitly forwarded and implemented, when you could simply have a protocol message that is trivially forwardable, inspectable, loggable, forwards and backwards compatible, extensible and can be interfaced with in a wide range of languages with already available code generators?" :) |
||
|
||
One might might be tempted to use a protocol to avoid an inter-standards | ||
dependency on Wasm interface types. A protocol requires passing the serialized | ||
data into and out of the debugging module. Passing that data in or out requires | ||
knowledge of calling conventions and memory ownership (who mallocs and who | ||
frees). This is a problem that Wasm interface types are already standardizing a | ||
solution for, and which engines already intend to support. Duplicating standards | ||
work done by another subgroup is far from ideal: it leads to more implementation | ||
work for both toolchains and engines. | ||
|
||
The final thing to consider is the code size impact that using a protocol | ||
implies. Incoming messages must be deserialized and outgoing messages must be | ||
serialized, and both those things require non-trivial amounts of code. On the | ||
other hand, with Wasm interface types most of the functionality is implemented | ||
once in the Wasm engine, and doesn't bloat every module's code size. | ||
|
||
### Can debugging modules run outside of the debuggee process? | ||
|
||
Yes! Engines are free to run debugging modules in their own process and proxy | ||
calls into `SourceDebugger`, or from `SourceDebugger` to `WasmDebugger`, across | ||
an IPC channel (or even over a wire protocol). However, this is an | ||
implementation detail for the Wasm engine and embedder, and not something that | ||
this proposal needs to standardize. | ||
|
||
### What about Wasm that is ahead-of-time compiled into a native library or executable? | ||
|
||
This proposal does *not* intend to solve source-level debugging of a native | ||
library or executable that is derived from Wasm that is in turn derived from | ||
some high-level source language. That would most likely be best served by | ||
providing the debugging information in the format that is usually expected for | ||
native code on that platform (e.g. DWARF or PDB) so that all the usual tools for | ||
debugging and profiling native code continue to Just Work without modification. | ||
|
||
Theoretically, this proposal could be leveraged in a Source->Wasm->Native | ||
compilation pipeline for translating source-level information into Wasm-level | ||
information, which the compiler then lowers into DWARF or PDB. Even though we | ||
would not ultimately end up with a single debug format for all targets and all | ||
situations, it is expected that a rising tide will lift all boats: some | ||
debugging modules will undoubtedly use DWARF or PDB internally and having shared | ||
tools and conventions for working with Wasm and those formats will help both the | ||
AOT and runtime use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the requirement that a debugging module needs to be a Wasm module be problematic? When we'll start implementing debugging modules for DWARF, for example, we could leverage existing code from LLDB/LLVM that will give us the ability of parsing DWARF files, decode DWARF information, un-mangle names according to the source language specified, supporting multiple languages, evaluate expressions to determine the value of source-level variables, and so on.
Modifying all this existing code so that it compiles to WASM could be a daunting task. It would be much simpler to have 'native' debugging module and define a (websocket-based?) debugging protocol to communicate with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. Probably we should just leave it to implementation detail.