-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DWARF for WebAssembly Target #1
Comments
There is D52634 opened. Let's outline the short-term goal here. The long-term goal is being documented (I recently updated) in the finding document. The debug information, produced by LLVM, is not capable to expose WebAssembly specific items in the DWARF expressions. I considered the idea of using registers operands to express WebAssembly locals, globals, an operand stack, etc. It complicates logic around the encoding/decoding of such items and it still do not provide any value -- the WebAssembly specific DWARF expressions need to be transformed or special-cased during an evaluation process. So the short-term goal is to extend DWARF expression with WebAssembly specific items such as locations for locals, globals and stack operands. The
The dump tools, such as llvm-dwarfdump, will be extended to display such expression in e.g. Current TODOs:
|
Pre reviewers require, replacing DW_OP_WASM_location with DW_OP_breg operators. See changes above |
With that doc, should the date on the sub heading be updated from "21 May 2018"? It sounds like it hasn't been changed in ~6 months. 😄 |
The "WebAssembly Debugging Capabilities" doc is 6 months old, yes. The "DWARF for WebAssembly Target" at https://yurydelendik.github.io/webassembly-dwarf/ is last modified 13 December 2018. |
Thanks @yurydelendik. 😄 |
In this proposal, I'm using a "function label index space" to refer to labels from the binary name section. I wrote a rationale for using function label indices over binary offsets that I'll paste here:
I believe parts of this rationale also apply to the code addresses in DWARF for WebAssembly; you reference instructions through a binary offset relative to the code section. Have you considered using an index into a function's instruction sequence instead? |
Yes, I did. Function Index + instruction index creates a compound key and it is harder to maintain, as well as module wide instruction index. Since most of WebAssembly tooling adapted bytecode offset, there is no reason to invent something that will not be useful, e.g. the optimization tools change the instructions (order), so instructions indices will be changed thus giving no advantage over bytecode offset for processing debug information. |
It's true that it will be harder for a transform to avoid the need to transform the DWARF sections, but there are some useful transforms that don't change the instruction sequence of a function:
|
Annotations needs to be introduced into text format, similar to LLVM's .ll approach. I don't think it will reasonable to preserve DWARF sections data as-is during round trip of text format without annotated instructions and data segments.
The |
The reason why it's easier to round trip DWARF through the text format applies the same whether you're translating the binary DWARF data to some annotated text syntax or not. The abstract syntax that ties your binary and text format together will need some way to represent code addresses that's independent of their binary serialization. If the serialized format uses binary offsets, then serializing those code addresses will be tightly coupled to serializing the code section. For example, when you serialize the code section, you'll need to produce a map between instruction indices and binary offsets that you can use when you serialize the DWARF sections.
I was thinking that the function index could be implied by the context, but I can see the DWARF format occasionally uses "code addresses" outside of the context of a function. Maybe it's possible to interpret code addresses to be function indices in some contexts (e.g. to define the code in a compilation unit) and instruction indices in other contexts. |
A round trip wasm->text->wasm without changing the text is very narrow/insignificant use case and shall not be used to make decision how the instructions must be identified in the DWARF format IMHO. In most of the (superset) cases the text will be changed, which requires changes in the DWARF data.
The .debug_line (and .debug_frame) section has no "contexts" and requires function instructions to be uniquely identified. |
There seems to be general consensus that some flavor of DWARF is what we want for at least the LLVM-based family of language and there are now several interoperable implementations of this spec. Can we check something into this repo that describes the current prototype? Then it would be easier to open specific issues or PRs than to add more comments here. e.g. I want to talk some more about the topic above (section vs binary offsets vs some abstract index space), but it would be better to have separate topics. |
https://github.com/yurydelendik/webassembly-dwarf/ is a valuable GOTO for folks increasingly asking about DWARF in WebAssembly. However, it is a personal project and it seems odd for a large ecosystem to rely on this as a primary source. It also puts undo burden on personal time of the owner for things like answering issues as I don't think it was meant to replace W3C work rather stand in until something happens here. Since the time this issue was opened and now, I'm pretty sure several large projects are using this information in how they do dwarf in wasm. Is there any way this can become canonicalized here or in the spec repo? If not now, how many implementations need to use another site ad-hoc until it becomes relevant? If there's some sort of bar to get over I can help hunt as I suspect we've already crossed it by now. cc @rossberg |
@codefromthecrypt, the spec repo only contains documents that have gone through the process and that the WG has officially adopted as standards. I think you mean whether dwarf support could be a repo under the WebAssembly organisation. For that, the champion would have to bring it to the CG as a proposal and ask for a vote. |
thanks for the response @rossberg! Anyone who knows can answer below if possible.
|
The champion is indeed a motivated individual (or a group of individuals) interested in pushing a feature forward. The Wasm Community group is free to join.
The phases process of standardizing a feature is described here. Most features do start with a design issue, and then get moved into the WebAssembly organization as a proposal. Here is the list of finished proposals that have been merged into the spec after progressing through the process linked.
CG is the WebAssembly community group.
I'll defer this question to @dschuff or @yurydelendik. |
I actually think it would probably make sense to just start by adding the DWARF description to a doc in https://github.com/WebAssembly/tool-conventions/ which is where we also document related things such as the wasm object file format, LLVM C ABI/calling conventions, etc. That's just a matter of putting the information in a convenient format and making sure it still matches the reality of what e.g. LLVM is generating. Currently I don't know of any other toolchain besides LLVM that generates DWARF like this. If that stays the case, then it may not make sense to go for standardization. But I would be very interested in hearing of other producers or consumers. |
Thanks for the advice @dtig and @dschuff I understand the process and also what seems to be a short-cut start
some start like this makes sense. @7ombie @tromey @Jiboo @rianhunter @pfaffe @ggreif I know you contributed to https://github.com/yurydelendik/webassembly-dwarf I'm not sure if you are still active and have stake. If so, do you have anything to add to what should be in the first iteration of that doc. Future iterations can follow. If DWARF+Wasm is no longer relevant to you, please unsubscribe and forgive my spamming you.
wazero is the project I work on which has no dependencies, so doesn't rely on LLVM. My stake here is to help @r8d8 implement this with the best guidance possible tetratelabs/wazero#58 If anyone else spammed here is working in a way that doesn't end up using LLVM anyway, please respond if you can. Let's get the first proposed PR with the best context! |
We sort of did that by linking to yurydelendik/webassembly-dwarf back in WebAssembly/tool-conventions#148, but moving the DWARF integration doc itself into the repo also makes sense to me, assuming there are no objections from @yurydelendik. |
FWIW for me, moving the content is best as that widens the net of folks that can help maintain it, and formalizes an understanding beyond a personal repo. If we can't move it, we should recreate something similar. |
FWIW (not that it contradicts your statement) Golang tried that too, but ran into some issues where their DWARF was not valid, and I think the author abandoned the PR afterwards. See https://go-review.googlesource.com/c/go/+/283012/ and golang/go#33503. |
I have no objections. Let me know if something needs to be done on my end. |
@mbovel do you happen to know if graal's Wasm implements DWARF and/or if that is implicitly done via LLVM? (ps nice job on your WASI tests! https://github.com/oracle/graal/tree/89e4cfc7aeea69970b60c64cd075ceb2a104e864/wasm/src/org.graalvm.wasm.test/src/test/wasi ) |
@codefromthecrypt no, GraalWasm does not support DWARF yet. (Thanks! I hope that there will be an official WASI test suite at some point 😄) |
From a functional perspective the existing WebAssembly DWARF spec + LLVM implementation works well on my end. Only issue is that re-generating the DWARF tables in-memory for JITted code is relatively slow and those in-memory tables can be unexpectedly large (e.g. gigabytes). I've seen other run-times add options to ignore the DWARF tables but it would be nice if options like that weren't necessary. Not sure what e.g. Chrome does, it's entirely possible this is an implementation issue on my end. I haven't had time to further investigate since the core part of it just works but eventually I will get around to it. I wonder if there is a way that we could structure the DWARF info in the WebAssembly binary such that it could be memcpy()'d in-place in memory and would only need some minor edits to get working with GDB + JITted code. E.g. leaving sufficient NULL bytes or No-ops in certain location descriptors so that they could be filled in later. |
The issue I'm having relates to using DWARF to debug WAT (I'm actually working on my own assembler, but the issue affects WAT and anything like WAT, so I may as well just focus on WAT, and my project will benefit indirectly). The Extended Constant Expressions Proposal adds a few more instructions to constant expressions (basically, You cannot map DWARF to the instructions inside constant expressions, as the offsets are relative to the Code Section, and constant expressions are not stored in the Code Section. |
@yurydelendik thanks for clearing the path for formalization of your work! So, next step seems to add your content as DWARF.md that seems to imply conversion to markdown and also creating an images directory https://github.com/WebAssembly/tool-conventions/ I also could offer to raise the PR converting the existing content to markdown, also. I think getting started in markdown while a formatting step back can help harvest the conversations here into a repo where action can be taken (ex an issue to hash out the valid point about constant expressions) @RReverser will Yuri or I have access to raise a PR to https://github.com/WebAssembly/tool-conventions/ without a W3C account? If so, that seems like the best start right? If there's paperwork around that, could one of you do this on his behalf? While you are at it, can you use Github's "move issue" feature to move this issue to the other repo as that's where the change will occur? cc @dtig |
copying in @sbc100 as per WebAssembly/spec#1428 I'm gathering there's a sense that compatibility is a solved or nearly solved topic here. I am not trying to be problematic, but I think there's too much comfort in the status quo when things useful tend to not be defined by the w3c standard and often punted to 3rd party repos or left in issue cul-de-sacs like this. When I first started in WebAssembly, it felt due to conferences promotions and such that there are some sort of staffing to maintain the spec towards compatibility by virtue of implementing it, as opposed to by virtue of looking at many non-standard repos or phases or subordinate repositories. I don't think people mean to create a very high barrier to enter this ecosystem, or are actively hoping there's only one viable impl. However, if specs are left abandoned or moved around to READMEs things aren't easier. This is the last unsolicited comment I'll make on spec repos about some root issues of abandonment or otherwise. If leadership desires more feedback about how compat or entrance into Wasm could be made easier, feel free to ask. |
Just to make sure I'm understanding you... If so, I totally agree with you. (actually, even if that's not what you were trying to say, I agree with that statement 😅) The good news is that we've gotten it working pretty well, and I hope to put out some better developer documentation for debugging with emscripten and Chrome soon. And I do think we can go forward with basically what we have here as a spec. |
@dschuff thanks for the consideration. Indeed the dominant issue I've found is sharing an implementation being the workaround to a gap in a spec, or even as a substitute for acceptance a gap exists at all. I'm not really even fussed that a "spec" is governed by W3C at this point, just some way to achieve portability without sharing one implementation. Will definitely look forward to reuse of whatever you produce, even if limited to notes only. |
I started documenting the findings that were done during my work on saving LLVM debug information as custom sections (see D44184 and D45118). The LLVM prefers DWARF format, and it is doable to package entire DWARF data into wasm custom section and convert it into wasm binaries source maps later. These findings can be found at https://github.com/yurydelendik/webassembly-dwarf/. Also, I attempted to match the findings with @fitzgen's WebAssembly Debugging Capabilities just for information purpose.
This issue is an attempt to open discussion about if it will be valuable to continue packaging debug data as custom sections in DWARF format.
The text was updated successfully, but these errors were encountered: