-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.bs
383 lines (287 loc) · 18.3 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
<pre class='metadata'>
Title: WebAssembly Debugging Capabilities
Shortname: wasm-debugging
Level: 1
Status: LS
Group: WebAssembly Community Group
URL: https://fitzgen.github.io/wasm-debugging-capabilities/
Editor: Nick Fitzgerald, Mozilla https://mozilla.org, fitzgen@mozilla.com, http://fitzgeraldnick.com
Abstract: Capabilities for source-level debugging of generated WebAssembly code.
Markup Shorthands: markdown yes
</pre>
Is this document missing something? [Send a pull request or open an issue on
GitHub!](https://github.com/fitzgen/wasm-debugging-capabilities)
# Motivation # {#motivation}
WebAssembly is not typically written by-hand. A higher-level language—such as C,
C++, C#, or Rust—is compiled into WebAssembly. Tools that operate on the emitted
WebAssembly should present results their results in terms of the original source
language, not the generated WebAssembly code.
## Supported Tools ## {#supported-tools}
Here is an incomplete list of the kinds tools that we expect to consume or
manipulate debugging capabilities:
: <dfn>Stepping debuggers</dfn>
:: A stepping debugger allows users to step through their program's execution
line-by-line (or expression-by-expression) in their original source
language. [gdb](https://www.gnu.org/software/gdb/) is a popular stepping
debugger, and most Web browser's developer tools also include a stepping
debugger for JavaScript.
: <dfn>Logging consoles</dfn>
:: Logging consoles collect and display messages logged by the program. These
messages are not limited to simple strings; they may contain objects or even
tables. Most Web browser's developer tools include a logging console.
: <dfn>Time profilers</dfn>
:: Time profilers analyze where a running program is spending its time. They
help users improve their program's throughput or latency. They might be
implemented by periodically sampling the running program's stack or by precisely
tracing certain operations.
: <dfn>Code size profilers</dfn>
:: A code size profiler analyzes which source language functions and constructs
are responsible for which portions of a binary. They help users produce small
binaries. [Twiggy](https://github.com/rustwasm/twiggy) is a code size profiler
for WebAssembly.
: <dfn>Linkers and bundlers</dfn>
:: Linkers and bundlers combine incomplete fragments of programs (object code)
into a complete program. They typically compose debugging capabilities, rather
than consume or generate them directly. Examples include
[lld](http://lld.llvm.org/) and [webpack](https://webpack.js.org/).
: <dfn>Static analysis tools</dfn>
:: Static analysis tools analyze a program and report their results to
developers without ever running it. For example, providing a conservative
estimate about how deep the stack may grow, or visualizing why a compound type's
size is so large.
: <dfn>Instrumentation and rewriting tools</dfn>
:: It can be useful to instrument binaries with extra instructions, or to
completely rewrite blocks of code. For example,
[binaryen](https://github.com/WebAssembly/binaryen) can instrument all memory
accesses, helping users debug heap corruption, and
[wasm-snip](https://github.com/fitzgen/wasm-snip) can replace a function's body
with an `unreachable` instruction.
This list is not exhaustive, but we hope it is representative.
# Requirements # {#requirements}
This section attempts, as much as possible, to avoid embedding assumptions about
whether debugging capabilities are provided by parsing a static format (a la
[DWARF](http://dwarfstd.org/)), via a programmatic interface (a la [language
servers](https://github.com/Microsoft/language-server-protocol)), a combination
of those two, or by something else entirely. Rather than proposing a particular
solution, this is an attempt to enumerate the requirements and constraints that
*any* solution must satisfy. Therefore, I use the term "debugging capability"
rather than "debugging information" or "debugger server".
Furthermore, I've categorized these requirements as "musts" or "shoulds". A
"must" is a hard requirement for an MVP: generally something that is already
supported by source maps, or is required *now* in order to support a "should"
*later*. A "should" is something that is not a hard requirement for the MVP, but
for which we have supporting use cases.
## General ## {#general-requirements}
### Must be future extensible ### {#future-extensible}
So that we can ship incrementally, we need to be able to evolve the debugging
capabilities over time, as we add functionality. For example, we might initially
ship without support for describing packed struct types, but we must not
preclude ourselves from ever supporting them.
Note: The source map format is *not* future extensible. The number of
variable-length quantities that each mapping contains is fixed, and adding more
information to a mapping will only break old consumers of the format. On the
other hand, JSON would be future-extensible, in that one could add new fields to
an object for new debugging information. Old consumers should ignore fields they
do not recognize, while new consumers leverage the new information. Of course,
source maps used to be JSON-based, but moved to variable-length quantities to be
more [compact on disk and over the network](#compact-on-disk).
### Must be embedder agnostic ### {#embedder-agnostic}
WebAssembly is portable and embedder agnostic, and its debugging capabilities
must be as well. The debugging capabilities must not be specific to any single
kind of embedder or domain. For example, the debugging capabilities must not
assume that the embedder is a Web browser.
### Must support querying static properties without running the debuggee program ### {#debuggee-not-running}
Every query describes either a static or dynamic property of the debuggee
program. What is a static property for one binary might be a dynamic property
for another. The debugging capabilities must support querying both static and
dynamic properties, and static properties must be queryable even when the
debuggee program is not running.
For example, a CLI code size profiler must be able to consume information about
inlined functions or generic function monomorphization without running the
debuggee program. A static analysis tool that never runs its input program must
be able to report its results in terms of source locations. A linker or bundler
that is composing two wasm object files must not have to run the incomplete
programs contained in the object files in order to merge their debugging
capabilities.
### Must be embeddable within the WebAssembly ### {#embeddable-in-wasm}
For ease of development, the debugging capabilities must be embeddable within a
custom section or sections of the debuggee program's `.wasm` binary. This avoids
wrangling multiple files and associated issues like moving one file but not the
other and breaking relative references between them.
It is common to embed source maps as a data URL within `//# sourceMappingURL`
pragmas by base64 encoding them.
### Must be separable from the WebAssembly ### {#separable-from-wasm}
On the other hand, you most certainly do not want to include the debugging
capabilities within the `.wasm` binary you ship in production because of size
concerns and the associated network costs. But bugs happen in production, and
you might want to use a stepping debugger with your live production code base,
or apply debugging information to stack traces captured via telemetry and then
processed offline, a la [Sentry](https://sentry.io/welcome/).
### Must be compact on disk and over the network ### {#compact-on-disk}
There is so much information that a tool might query for, on the order of the
debuggee program itself. Much of this information is regular and
repetitious. Source maps encode only a fraction of the information that we
ultimately want to support queries for, and they are regularly many megabytes in
size. The current source map specification is in its third revision, and each
revision has focused on making the format more dense. Empirically, relying on
general compression algorithms has not been enough.
### Should be compact in memory ### {#compact-in-memory}
Ideally only portions of the debugging capabilities are necessary to load in
memory at any given time. E.g. if the debuggee program consisted of multiple
linked compilation units, then a debugger should only need to initially load the
capabilities for the current function's compilation unit. Answering a single
query (e.g. finding the original source location for a generated code location)
should not require loading the full debugging capabilities for the whole
debuggee program.
Note: DWARF, for example, has various index sections that allow for accelerated
access into other sections. Those other sections do not need to be loaded into
memory, and one can use the index to find an offset and then read just the
relevant bits. The `.debug_aranges` section allows a debugger to quickly find
the compilation unit and associated data in the `.debug_info` section for a
given address, without parsing and loading into memory all of the `.debug_info`
section. See section 6.1 of [DWARF 5](http://dwarfstd.org/doc/DWARF5.pdf) for
details.
### Should be fast to consume, generate, and manipulate ### {#fast-to-manipulate}
A fast tool provides a better user experience than a slow tool.
For example, a linker would ideally not need to eagerly apply relocs within the
debugging capabilities. Instead, the debugging capabilities consumers or
debugging capabilities themselves would lazily apply relocs when a particular
query forces it. The indexed variant of source maps provides similar lazy-reloc
functionality.
### Should support interpreted debuggees, where the interpreter is written in Wasm ### {#interpreters}
With interpreted debuggees, there are nearly zero static properties of the
debuggee that are reflected in the wasm itself. Instead of being the debuggee
program, the wasm is interpreting yet another language in which the debuggee
program is authored. This has profound impacts on the way that tools, debugging
capabilities, and the debuggee program interact:
* For a stepping debugger to set a breakpoint, it cannot simply translate source
locations into a set of wasm bytecode offsets and use an engine-specific API
to pause when those bytecodes are executed. There are no wasm bytecodes that
directly correspond to a given source location. We could be paused at the same
wasm bytecode multiple different times and logically be paused in multiple
different source locations. The interpreter's cooperation is necessary to set
a breakpoint, since it is the only entity that knows what source location is
executing at any given time.
* For a sampling profiler to capture a stack, it cannot simply save the wasm
functions and their bytecode offsets on the stack, and symbolicate and expand
inlined functions offline. In fact, the active wasm functions have almost no
correlation with the active interpreted debuggee functions. Only the
interpreter itself knows which interpreted debuggee functions are active, and
therefore its cooperation is required to correctly sample the stack.
* Etc.
## Locations ## {#locations}
### Must support querying the original source location for a given generated code location ### {#source-to-code}
This is a common query. A trap is raised in the debuggee program and a stepping
debugger would like to know in which source location it should place its
cursor. A static analysis tool found something interesting at some location in
the generated code, and would like to present this to its user in terms of the
original source location. A profiler would like to express its sampled stack
trace in terms of original source locations.
### Must support querying the generated code location(s) for a given original source location ### {#code-to-source}
Another common query. A user sets a breakpoint on some original source line, and
the debugger must use the debugging capabilities to translate that into a set of
generated code locations that it should pause execution upon reaching. An
instrumentation tool that inserts tracepoints or logging at some original source
location would translate the input source location into a generated code
location where it should insert new instructions.
### Must support enumerating all bidirectional mappings between original source and generated code locations ### {#enumerate-location-mappings}
Firefox's stepping debugger will enumerate all mappings to determine the set of
lines in the original source files that can have breakpoints set on them, and
highlight this information in the UI. Since pretty much every generated code
location ends up needing to be relocated when bundling JavaScript files,
bundlers will enumerate every mapping and apply relocs directly, rather than
maintaining a side table for these relocs.
## Source Text ## {#source-text}
### Must support embedding the original source text ### {#embedding-source-text}
Moving a source file or a generated `.wasm` binary might break the link between
the debugging capabilities and the source text, if the source text is
external. For ease of development, the debugging capabilities must support
embedding the source text within itself to render this scenario impossible.
Note: source maps support embedding the original source text into the source map
itself with the `"sourceContents"` JSON field.
Note: this is not desirable for production or with larger projects, since it
will inflate the size of the debugging capabilities and conflict with
[[#compact-on-disk]].
### Must support referencing external source text ### {#external-source-text}
For large projects, embedding the original source text within the debugging
capabilities is not worth the trade off. There may be many megabytes of library
source text that the user will not view, because they are (for example) stepping
in their business logic code. In this scenario, that library source text serves
only to slow down the initial download of the debugging capabilities.
Note: source maps support referencing external source text by URL on the Web,
and (absolute or relative) file path on Node.js.
## Inlined Functions ## {#inlined-functions}
### Should support querying the inline function frames that are logically on the stack at some generated code location ### {#query-for-inlined}
A sampling profiler should augment its sampled stack of physical frames with the
logical frames that were inlined. JavaScript profilers will do this today with
the help of the JavaScript engine, rather than using debugging capabilities.
The set of logical frames on the stack might also be a dynamic property. For
example, an interpreter written in WebAssembly should be able to expose its
interpreted language's activation frames.
### Should support enumerating the logical inlined function invocations within a physical function ### {#enumerate-all-inlined}
What code ranges are due to an inlined function invocation? Note that a single
inlined function invocation may be spread across multiple, distinct ranges due
to code motion by the compiler.
Code size profilers want to find large functions that are getting aggressively
inlined many times, and might be causing size bloat.
## Types ## {#types}
### Should support describing scalar types ### {#scalar-types}
Examples include signed and unsigned integers, floats, booleans, etc. What is
the name of the type? What is the type's size and alignment? These things are of
particular importance to stepping debuggers and anything that wishes to
understand a value in a running program's live memory.
### Should support describing compound types ### {#compound-types}
Examples include pointers, arrays, structs, packed structs, structs with
bitfields, C-style enums, C-style untagged unions, Rust-style tagged unions,
etc. What is the name of the type? At what bit or byte offset is each member
field?
In addition to stepping debuggers, this information is also useful to tools that
are not actually running the debuggee program, like
[`ddbug`](https://github.com/gimli-rs/ddbug), which lets you inspect the sizes
and layouts of a program's types, so you can optimize their representation.
### Should support type-based pretty printing ### {#pretty-printing}
A stepping debugger or logging console should be able to pretty print values
based on each value's type. For example, a string might be represented as a
pointer and length pair, but should be displayed as its characters.
## Scopes and Bindings ## {#scopes-and-bindings}
### Should support querying for the scope chain at a given generated code location ### {#query-scope-chain}
When a stepping debugger pauses at a location, it should display the scopes it
is paused within.
### Should support enumerating all bindings within a scope ### {#bindings-within-a-scope}
When a stepping debugger is paused, it should display the bindings within each
scope it is paused within. Is the bindings a formal parameter? A local variable?
Is it a mutable or constant binding?
### Should support querying a binding's type ### {#type-of-binding}
When a stepping debugger is paused, for each binding that is in scope, it should
be able to display the binding's type.
### Should support describing a method to find the location of or reconstruct a binding's value ### {#reconstruct-a-bindings-value}
When a stepping debugger is paused, for each binding it displays, it should also
display the binding's value.
"Reconstruct" because a value might not live in a single contiguous region of
memory or a single local. The compiler might have exploded it into its component
parts and passed each part by value in different parameters. It might be on the
stack during this bytecode range but in memory during that bytecode range within
the same function.
## Generic Functions and Monomorphizations ## {#generics-and-monomorphizations}
### Should support querying whether a function is a monomorphization of a generic function ### {#query-if-monomorphization-of-generic}
A code size profiler should identify generic functions that have been
monomorphized "too many" times and are leading to code bloat. They should be
able to do this regardless if the function is inlined or not.
# Specification # {#specification}
TODO: after we have flushed out and agreed upon requirements.
<pre class="biblio">
{
"WebAssembly": {
"authors": [
"WebAssembly Community Group"
],
"href": "https://webassembly.org",
"title": "WebAssembly Specification",
"status": "LS",
"publisher": "W3C",
"deliveredBy": [
"http://www.w3.org/html/wg/"
]
}
}
</pre>