Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The WebAssembly debugging format is incompatible with its text format #17

Open
7ombie opened this issue Feb 15, 2022 · 18 comments
Open

Comments

@7ombie
Copy link

7ombie commented Feb 15, 2022

I opened an issue in yurydelendik's repo about six weeks ago, but it hasn't been addressed, and I thought I'd mention it here. Thanks.

@dschuff
Copy link
Member

dschuff commented Feb 16, 2022

Thanks for posting this here. I'm going to copy the text from the other issue here for convenience:

The current DWARF for WebAssembly spec requires that instruction addresses are offsets into the Code Section. However, (extended) constant expressions contain instructions that are executed at runtime (and can contain errors), but their instructions are not encoded in the Code Section.

If I understand correctly, DWARF allows us to optionally specify which memory each instruction is stored in (by index). Presumably, that feature could be repurposed to reference section IDs instead. Given that the zero section ID is used for custom sections (so it cannot unambiguously identify an individual section), zero (in this context) could be mapped to the Code Section, so that section remains the default.

To be honest, I'm just beginning to learn about DWARF, so don't understand it very well yet. In any case, it should be possible to debug the text format with the debugging format.

WABT would ideally implement DWARF support in wat2wasm, and source languages operating at the same level of abstraction really need to support DWARF (if DWARF is all that browsers understand). I'm personally working on a source language that's basically WAT with whitespace and sugar, and there have been other alt-wat source languages in the past. None (except WAT) have achieved any real adoption, but it should at least be possible to debug these languages.

@dschuff
Copy link
Member

dschuff commented Feb 16, 2022

The current DWARF for WebAssembly spec requires that instruction addresses are offsets into the Code Section. However, (extended) constant expressions contain instructions that are executed at runtime (and can contain errors), but their instructions are not encoded in the Code Section.

This is a good point (although I think the instructions currently in extended-const expressions can't trap... but if you just mean can have errors generically that you might want to debug, then yes, this does seem like a limitation currently). The idea of allowing debug info for instructions in other memory spaces seems like it could work in principle; in terms of support in browsers generally and Chrome's DWARF debugger specifically I'm not sure how much work would be needed to make the debugger understand this info and make the engine able to break in extended-const initializers. I think it would probably depend on the extended-const proposal, which AFAIK hasn't actually been implemented in any engine yet; I'm pretty sure MVP initializers just get directly calculated at instantiation time and don't actually get generated as code anywhere such that you'd be able to break on them. It's probably unknown at this point whether the VM would use its regular JIT and generate some pseudo-function for the initializers that you could break on, or whether it would do something else.
In any case it's not super clear to me what the value of having debug info for initializers would be? In most languages, they are not really generated from user-specified code? (e.g. I doubt we'd be able to use them in Clang; I think even C++ constexprs mostly just result in relocation expressions which are not much more complex than the current extended-const proposal, which would mean the result would be precomputed, and putting debug info on that value wouldn't be super useful. But I will admit I haven't really looked at this before).

Regarding debug info support for the text format: If the original source is the wat format itself, then browsers do a reasonable job of this today without any DWARF (i.e. the browser devtools will show a disassembly where you can break, and view the values of locals, globals, memory, etc.). I'm not totally sure if this disassembly will show things like variable names from the name section, but it seems like that could be possible to add. Given that the name section can label basically all wasm-level constructs, I'm not sure what DWARF would add to that when the source is wat.

If a language is WAT with whitespace (but where e.g. the variables and their types are all raw wasm) then a source map might be enough. You could break on a line, devtools would display the source code, and would also show you the wasm locals and globals. Names could be taken from the name section.

Another possible thing to consider could be using LLVM's wasm assembly format. Like assembly languages for other architectures, it allows you to use labels and symbolic names for code and data locations (these become relocations, and then you can link your program with LLD, and also of course link with code compiled from C or other LLVM-using langauges). If your langauge is a bit higher-level than wat, then generating that might not be any more difficult than generating wat. If you wanted to generate your own DWARF, you could do it the way clang does; essentially you'd have to do your own encoding, but you can refer to labels in the code or data sections symbolically.

In principle you can also have wat files that include text representation of custom sections, but the wat format is missing that symbol/label capability, so encoding instruction or data locations for DWARF or any other purpose wouldn't really work without juts doing the assembly ahead of time.

@7ombie
Copy link
Author

7ombie commented Feb 16, 2022

Thank you, @dschuff. I'm very grateful for the detailed advice. That will be useful, cheers.

You may be right. It might make more sense to just validate and evaluate the constant expressions at compile time, and replace them with (known to be valid) const instructions. Though, this does beg the question of why constant expressions are allowed in the binary at all.

On reflection, I think mentioning Extended Constant Expressions just confused the issue. It's been little while since I opened the issue, and am a bit vague on what I was worrying about now :/

What happens when a constant expression evaluates to an invalid value? For example, it specifies an out-of-bounds offset for a memory segment. Maybe I've misunderstood something, but wouldn't that trap, and require the debugger to point to the expression that expresses the invalid offset? We wouldn't always know the size of the memory at compile time, so couldn't know whether or not a given offset is within bounds.

@lars-t-hansen
Copy link

@dschuff

I think it would probably depend on the extended-const proposal, which AFAIK hasn't actually been implemented in any engine yet ...

Implemented and enabled in Firefox Nightly for some time now (@eqrion got bored, I guess, and implemented it).

@dschuff
Copy link
Member

dschuff commented Mar 25, 2022

What happens when a constant expression evaluates to an invalid value? For example, it specifies an out-of-bounds offset for a memory segment. Maybe I've misunderstood something, but wouldn't that trap, and require the debugger to point to the expression that expresses the invalid offset? We wouldn't always know the size of the memory at compile time, so couldn't know whether or not a given offset is within bounds.

I believe this would result in an instantiation failure (and on the web, the instantiation function would throw). In principle an implementation could create a function-like construct and invoke it, and treat it like any other debuggable function (IIRC this is waht @sbc100 did with Wabt). Such an implementation would also have to figure out how to surface that function in whatever debugging UI it provided. My understanding is that V8's implementation (recently done) doesn't do this but instead just interprets the expression directly, so there wouldn't really be a way to do that.

@7ombie
Copy link
Author

7ombie commented Mar 26, 2022

Thanks, @dschuff. That's helpful information.

I really wanted to support source-level debugging (with introspection) in DevTools (just like using C, C++ et cetera). Errors in DevTools would ideally link to the appropriate line and column in the WAT source, based on the DWARF data in the binary. It seems like DWARF for WebAssembly should make that kind of thing possible for WAT source too.

@sbc100
Copy link
Member

sbc100 commented Mar 26, 2022

Thanks, @dschuff. That's helpful information.

I really wanted to support source-level debugging (with introspection) in DevTools (just like using C, C++ et cetera). Errors in DevTools would ideally link to the appropriate line and column in the WAT source, based on the DWARF data in the binary. It seems like DWARF for WebAssembly should make that kind of thing possible for WAT source too.

In theory this is possible yes, but it would require adding DWARF-generation support to some tool that can convert wat to wasm (for example wabt or binaryen). This is likely to be fairly large project, with relatively low impact (since wat is already almost 1-to-1 with the native wasm disassembly in a wasm debugger).

@7ombie
Copy link
Author

7ombie commented Mar 26, 2022

The issue I'm personally having is that I created a novel assembly language with the Wasm ISA. It's basically WAT with significant whitespace and sugar. If DWARF for WebAssembly would work with WAT in theory, I could implement the support for my own assembler. I focussed on WAT when opening the issue, as that's more important.

A lot of people write WebAssembly by hand, and many of them really dislike WAT syntax, so even if my project doesn't take off, there will almost certainly be at least one reasonably popular language of this type.

It shouldn't be that difficult to map to code outside the Code Section, so this support would be fairly simple to implement.

@7ombie
Copy link
Author

7ombie commented Mar 27, 2022

In any case, if you imagine what it'd be like to debug WAT source in DevTools, just like a first-class source language, then compare that to debugging WAT projects currently, it's clear that tools like wat2wasm would be much nicer to use, if they did support DWARF (and DWARF for WebAssembly permitted it).

@sbc100
Copy link
Member

sbc100 commented Mar 27, 2022

Are you asking for wat2wasm to generate DWARF that maps back to the input wat file? If so, all that would take would be for someone to implement DWARF output in wabt.

Or are you asking for wat2wasm to understand some kind of DWARF annotations that allow mapping back to some other (higher level) input file? (this would be a lot more work)

@7ombie
Copy link
Author

7ombie commented Mar 27, 2022

@sbc100 - The first one. I'd like to write WAT code, compile it using something like wat2wasm, and have the tool generate debugging information (using DWARF for WebAssembly), so I can debug the WAT source in DevTools (like I can with C code).

Currently, this is generally possible, except if there's a runtime error in a constant expression. There's no way for DWARF for WebAssembly to map to that code (as it's not in the Code Section), so source-level debugging is not possible there (DevTools would not be able to deduce which line and column in the WAT source to map back to).

If this was fixed (with WAT as the motivation), my language would just implement its own version of the same thing. In that sense, we can just focus on WAT, while being mindful that other people may want to create direct alternatives to WAT that function in the more or less the same way. My project is no more (or less) important than the potential for that class of project to exist.

@sbc100
Copy link
Member

sbc100 commented Mar 27, 2022

Leaving aside the fact the DWARF doesn't work for constant expressions, which seems like a separate issue, it sounds like the issue would otherwise be solved by sufficiently motivated person adding DWARF-generation to wat2wasm. Being only vaguely familiar with DWARF I imagine this would be a non-trivial, but relatively well defined, task. Maybe we should open an issue on the wabt repo?

@7ombie
Copy link
Author

7ombie commented Mar 27, 2022

@sbc100 - I thought I'd opened a feature request for DWARF support in wat2wasm, but cannot find the issue anywhere. That would be really nice to have.

This issue is specific to DWARF for WebAssembly, and being able to map to instructions outside of the Code Section (as required for source-level debugging of runtime errors in constant expressions). Potentially adding DWARF support to tools like wat2wasm is really just a rational for the current feature request.

I personally think WebAssembly's debugging format and its text format should be fully compatible, just because it's silly for things to be otherwise, but that's a pretty subjective argument.

@sbc100
Copy link
Member

sbc100 commented Mar 27, 2022

I don't disagree that it could be useful for wat2wasm to generate DWARF info.

However, I think that saying that its "silly" that this doesn't work this way today is a little strong. You can compare wasm as and ISA like x86 or arm, and I don't know of x86 or asm assemblers that generate debug information, do you?

@7ombie
Copy link
Author

7ombie commented Mar 27, 2022

I think that saying that its "silly" that this doesn't work this way today is a little strong.

@sbc100 - Sorry, but I only said that it seems silly to me personally that WebAssembly has a debugging standard that's not entirely compatible with WebAssembly's own source language. I was clear that this was subjective.

You can compare wasm as and ISA like x86 or arm, and I don't know of x86 or asm assemblers that generate debug information, do you?

I didn't say it was silly that wat2wasm doesn't generate debugging information. It was the fact that DWARF for WebAssembly doesn't fully support WebAssembly that I found ironic

@7ombie
Copy link
Author

7ombie commented Mar 28, 2022

The practical issue with DWARF for WebAssembly only using offsets that are relative to the Code Section is just that code already exists outside of the Code Section (and there may be more code outside of the Code Section in future, with proposals like Extended Constant Expressions).

Not being able to fully describe WAT source is an example of a limitation that arrises from the implicit assumption that code lives in the Code Section.

The sections have unique integer indices (except custom sections, but that's not an issue here). DWARF has ways of using indices to specify which block of memory an instruction is in, so DWARF for WebAssembly could provide some means to optionally specify which section an offset is relative to (with the Code Section being the implicit default).

@dschuff
Copy link
Member

dschuff commented Mar 31, 2022

The sections have unique integer indices (except custom sections, but that's not an issue here). DWARF has ways of using indices to specify which block of memory an instruction is in, so DWARF for WebAssembly could provide some means to optionally specify which section an offset is relative to (with the Code Section being the implicit default).

This sounds like a good idea.

I still don't really think there will end up being a good debugging experience for constexprs (even for extended-const) on most engines, for the reasons I mentioned above. But having this extension makes sense. I'm not sure whether LLVM would be able to generate it (I guess it could in theory, since there could be e.g. global variable initializer code encoded there, but I'm not sure if LLVM knows how to encode DWARF for that even on regular platforms). But I don't have any objection in principle to adding this extension as long as existing DWARF and debugger code keeps working.

@7ombie
Copy link
Author

7ombie commented Apr 1, 2022

@dschuff - Thank you. Much appreciated.

I'm not sure exactly how it should work in practice, as I'm still pretty new to DWARF generally. I only know that DWARF has a way of handling this. I'd be happy to look into it, but if you or anyone else here already knows roughly how this should work, I'd be grateful if they could advise.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants