Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should module byte offsets be used for specifying wasm code locations? #1071

Open
dschuff opened this issue May 22, 2017 · 4 comments
Open

Comments

@dschuff
Copy link
Member

dschuff commented May 22, 2017

In #1053 (and in the first version of #1064) locations of wasm instructions are specified as byte offsets in the module. This is analogous to use of PC addresses in native code, and this style of location is usable in browsers (e.g. for error messages and stack traces), in a module's name section as proposed in #1064 (which could feed into those same browser use cases, as well as online debugging), and in offline tools such as WABT and LLVM.

However, wasm is unlike traditional native-code architectures in that the code is not actually exposed to the program itself, so any specifications of references to code locations are just conventions rather than having any semantic meaning. All of the semantics and (other than the aforementioned cases) all of the conventions about wasm locations are self-consistent without reference to the binary bytes of the instructions (e.g. branch depth, imports/exports and all the other index spaces, etc). So it might be worth considering alternative naming/numbering schemes for referring to specific instructions in a wasm program.

@dschuff
Copy link
Member Author

dschuff commented May 22, 2017

Previous discussion can be found at https://github.com/WebAssembly/design/pull/1064/files#r116468692 and #1053 (comment) (and the replies to those)

@dschuff
Copy link
Member Author

dschuff commented May 22, 2017

One sort of fundamental question that hasn't been discussed yet is whether we care about having locations be specific to a function or not; i.e. the byte offset is global in the module whereas the so-far-discussed instruction-numbering schemes are all local to a function. So in that case a full reference would be a tuple of a function index and instruction index. I don't imagine browsers or encoding/decoding tools would care much; it's more interesting for debugging tools. For source maps and other JS-centric tools we currently have a mismatch between the line:column tuple used there, so using another field would probably be no big deal. DWARF and other systems that assume a single global address might be more interesting.

@yurydelendik
Copy link

yurydelendik commented Jun 20, 2017

Purpose of numbering scheme will be to identify an instruction, and most of proposed here things will work fine for this, e.g. module byte code offset, instruction number and function number pair, etc. We need to choose the simplest that will reduce overhead on operations performed with it, e.g. to quickly locate and display disassembly of the trapped instruction, without building/storing any additional indices in memory. I don't think the display format of the reference will matter much to the human.

Based on the use cases we already perform with instructions, any of the schemes suggested above will work just fine: global byte offset, function index and local byte offset pair, function index and instruction index pair.

I identified the following use cases related to the subject so far:

  • to define symbolic reference for a instruction (e.g. something for person to see/copy/paste from stack trace)
  • to locate instruction by reference in a textual representation (e.g. person must quickly match a symbolic reference for instructions they see in the disassembly dump)
  • to generate symbolic reference for a instruction seeing in a textual representation (e.g. person wants to set a breakpoint for the instruction they sees)
  • to map instruction reference to original source code and back (most of the original source will span across multiple files, and multiple functions within some module can refer same files, there will no be benefits choosing function-index-based over global scheme)

Are there more?

So it might be worth considering alternative naming/numbering schemes for referring to specific instructions in a wasm program.

Knowing what schemes our tools are using at the moment, and if there are any (performance?) problems with what is used, might help us to see if what we have is the acceptable solution.

@luser
Copy link

luser commented Dec 19, 2017

To get debug info without reinventing too many wheels I suspect the easiest path would be to use DWARF. If that's the case, then doing something that fits into existing DWARF conventions would make it easier to use existing tools. For attributes used to define machine code addresses (2.17 Code Addresses, Ranges and Base Addresses) like DW_AT_low_pc (7.5.4 Attribute Encodings), used to define the starting address of a subprogram, they're generally defined to be an address class (7.5.5 Classes and Forms) which is allowed to either be an explicitly-specified address (which is what C compilers typically emit), or an index into an address table (.debug_addr). The definition of the address table (7.21 Address Range Table) does allow addresses to specify a segment, so presumably you could use segments to distinguish between functions, but that might be a little weird. These seem to be used mostly for relocatable addresses currently? The format for source line info (.debug_line, 6.2 Line Number Information) is a sequence of opcodes to describe changes in the address and other info for a sequence of instructions. It does mention a segment_selector_size field in the header, but I can't find any indication of how you'd actually use that in the line number program.

I think using a single bytecode offset to reference instructions (equivalent to a program counter in other architectures) is likely to make for fewer headaches. It ought to be fairly straightforward to adapt DWARF for WebAssembly with that.

All the DWARF standard section references above are from the latest version of the DWARF spec (DWARF 5).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants