Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suggested format for wasm frames in Error.stack #814

Closed
wants to merge 3 commits into from

Conversation

lukewagner
Copy link
Member

This PR adds some should clauses for wasm frames in Error.stack strings. Based on previous conversations, I think we already agreed for wasm to use bytecode offset as line number. Happy to bikeshed the default auto-generated function name.

Copy link
Member

@jfbastien jfbastien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we want to mandate anything at all (even under "should"). Could you elaborate on the motivation?

`wasm-function[${i}]` where `i` is the index of the function in the module's
[Code section](BinaryEncoding.md#code-section).
* The filename *should* be that of the JS caller of the originating `Module`
constructor call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the .wasm file if there was one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only JS APIs we have atm take an ArrayBuffer of bytes so I think this is the best we can do. With ES6 module integration, we could have the same URLs as JS <script>s. With Streams integration, if you WebAssembly.compile(stream), maybe we could use the URL that was fetched to produce stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right.

*should* be 1.
* The function name *should* be generated from the template string literal
`wasm-function[${i}]` where `i` is the index of the function in the module's
[Code section](BinaryEncoding.md#code-section).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasm-function is the symbol section?

"or $i if no symbol section is present"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite: I'm saying that if you have a stack frame which you'd print as foo@url:line:col (if it was JS) and if you have no Names section (so you don't have any "foo"), that a function name would be generated, with strawman "wasm-function[0]" (where "0" is the index of the function).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. Why not have wasm-function at all then, and not just $i?

Also: I'm not sure I understand where you mention the Names section. Could you say:

The function name should be generated from a module's Names section if present, otherwise the string literal $i should be used where i is the index of the function in the module's Code section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was just to be able to syntactically distinguish the wasm functions from the normal JS functions since you'll see them mixed into the same callstack. I did indeed forget to explicitly mention the Names section here; I'll add that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, update looks good.

@pizlonator
Copy link
Contributor

Interesting. Regardling line/column numbers: it's possible to get some small compilation time boost if you allow yourself to fudge this a little. Example:

[1] int tmp = *p;
[2] tmp++;
[3] *p = tmp;

In this case, on x86, B3's instruction selector would emit something like:

addl $1, (p)

When this traps, B3 would have a choice of telling you that you trapped at [1], [2], or [3]. I believe we would choose either [2] or [3], depending on reasons. This saves some small amount of time and space, since it means that we don't have to think very hard about the "trap origin" of instructions. Of course none of this matters if you write the code the way you're most likely to write it:

*p++ // all on one line

What do you think of the following options for dealing with this:
A) Note in the spec that implementations may fudge the "line number" somehow. This amounts to a blurb saying that the line number is provided on a best-effort basis.
B) Promise that if provided, the "line number" is precise.

Honestly I don't know which is better. What do you think?

@lukewagner
Copy link
Member Author

lukewagner commented Oct 3, 2016

@jfbastien The motivation is just that we're going to all be generating something for Error.stack so it'd be nice for developers if we converge as much as possible to help out anyone who wants to, say, parse Error.stack "crash reports" that they send back to the server.

@pizlonator So I can definitely see the resource-saving value in allowing imprecise traps; previous discussions made me think everyone was interested in precise offsets. However, 99.9% of the time, the extra precision is probably wasted and so we could say Error.stack is imprecise and wait for some later proper stack inspection API and this would allow devs to opt-in to the extra code/metadata cost. Curious to hear other thoughts on this.

I feel like from your examples that we might be talking about different things, though. In this PR, I'm referring to the simple binary offset of a wasm instruction in the .wasm without any form of source notes to map back to the original source language. Thus there wouldn't be any provided line numbers (mentioned by (B)).

Regardless, your example points to a good question (if we reimagine lines 1-3 as the wasm instructions i32.load; i32.add; i32.store). In this particular case, though, I would assume that, since the initial i32.load dominates the i32.store, that when these three IR nodes were folded into one that the i32.load's trap information would get used for the resulting addl. But this wouldn't work if there wasn't a clear dominance relationship, e.g., if there is redundant code on both arms of an if. I'd expect the wasm producer to have eliminated most of this but in some cases (ISA-specific, bounds checks), it can't.

The more immediate overhead I think is the extra few bytes per possibly-trapping-instruction necessary (for a thunk or metadata, depending on impl) to note the bytecode offset.

@pizlonator
Copy link
Contributor

@lukewagner First of all, after thinking about it more, this is one of those things that is more a problem for converting existing compilers to have good trapping support rather than a problem for clean room implementations. B3 made a choice for how to track origin data (what line, bytecode offset, file, etc code comes from) and that choice happens to not allow for precise tracking of traps. That's more my problem than anybody else's.

But then again maybe other implementers will also benefit from this wiggle room.

So I can definitely see the resource-saving value in allowing imprecise traps; previous discussions made me think everyone was interested in precise offsets. However, 99.9% of the time, the extra precision is probably wasted and so we could say Error.stack is imprecise and wait for some later proper stack inspection API and this would allow devs to opt-in to the extra code/metadata cost. Curious to hear other thoughts on this.

It's a bit weird for the spec to mandate precision without mandating a specific error.stack format. I think that it's healthy for the spec to allow implementers to do best-effort here. The outcome for compilers that have the same situation as B3 is that there will be rare corner cases in which the reported trapping instruction is wrong.

What I was trying to say about line numbers is that although the wasm instruction offset will be wrong, it will probably point to the same C++ line number. So, if this does get used for some telemetry, then my bug won't hurt anyone.

But I think that the important question is just: since we can't mandate a specific format for error.stack, what does it even mean for the contents to be precise?

@lukewagner
Copy link
Member Author

@pizlonator Yep, good points. Given all these arguments and further discussion with the group, I think it probably does make sense to at least switch to best-effort and save precision for some explicit stack inspection future feature.

@lukewagner
Copy link
Member Author

lukewagner commented Oct 4, 2016

So that leaves bikeshedding of wasm-function[i] vs. i vs. other alternatives. Any other comments on this? @titzer what does v8 use?

@jfbastien
Copy link
Member

I'd rather spell out WebAssembly, since that's what the JavaScript object does.

@lukewagner
Copy link
Member Author

That could work for me; this particular choice in FF's impl predates Wasm->WebAssembly. That being said, wasm doesn't suffer the same Wasm vs. WASM vs. WAsm uglyfest.

@titzer
Copy link

titzer commented Oct 4, 2016

We use:

"at func_name ([func_index]+bytecode_offset)"

On Tue, Oct 4, 2016 at 7:09 PM, Luke Wagner notifications@github.com
wrote:

That could work for me; this particular choice in FF's impl predates Wasm
->WebAssembly. That being said, wasm doesn't suffer the same Wasm vs. WASM
vs. WAsm uglyfest.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#814 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1FYmxkSntc-pbBwadCtaq224Qbd5ks5qwohfgaJpZM4KM4fI
.

@titzer
Copy link

titzer commented Oct 4, 2016

@pizlonator Presumably B3's instruction selector doesn't cover more than one possibly-trapping instruction, so would it be possible to start with a range of bytecode offsets and then refine that with a post-processing of the bytecode to find the one possibly-trapping bytecode?

@lukewagner
Copy link
Member Author

lukewagner commented Oct 4, 2016

@titzer If there is no Names section I'm guessing func_name is simply absent? Is "<WASM>" a literal string? Judging by the "+", is bytecode_offset relative to the start of the function?

@pizlonator
Copy link
Contributor

@titzer B3's instruction selector definitely covers more than one trapping instruction, like in the store(add(load(p), v), p) case. You could probably mostly do such a post-processing though. I just fear that it would only barely work, because you'd have to account for all of the tricks that the instruction selector and other parts of the compiler do. If we cared about this being more than best effort then we would add deliberate support for this.

I just think that if you want this level of precision then we should give it to you as API rather than a promise that the error.stack string contains the number.

Also, if error.stack was spec'd by TC39, then my position would be different. As it stands it's a very weird place to promise to squirrel away precise information.

-Filip

On Oct 4, 2016, at 11:20, titzer notifications@github.com wrote:

@pizlonator Presumably B3's instruction selector doesn't cover more than one possibly-trapping instruction, so would it be possible to start with a range of bytecode offsets and then refine that with a post-processing of the bytecode to find the one possibly-trapping bytecode?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@AndrewScheidecker
Copy link

AFAICT the current design doesn't provide much visibility into the reason for a trap, and so it's pretty forgiving about reordering/merging possibly-trapping instructions. The spec tests check that there is a trap, and the reason for the trap, but there aren't any tests of code that would trap for different reasons depending on execution order.

Once signal handling is added, allowing possibly-trapping instructions to be reordered would result in non-determinism. It seems practical to prohibit that; that wouldn't prohibit this specific load-add-store optimization, since the runtime could still invoke the signal handler as-if the load and store happened in two steps. Are there any optimizations that this would prevent (assuming the WASM bytecode is already optimized)?

If reordering possibly-trapping instructions is prohibited, then it does seem practical to give a precise offset to the trapping instruction. However, the call stack will also include return addresses. Is it practical to give precise offsets to the instruction to be returned to? I think it can be done for no cost (in the same sense as "zero-cost exception handling") if you resolve the offset for a trapping PC vs a return PC differently, but I'm not sure.

IMO it does seem worthwhile to work toward deterministic trap and return addresses for signal handling. But for this PR I would definitely find it useful to have a standard format for stack traces, even if they contained imprecise bytecode offsets.

@jfbastien
Copy link
Member

@AndrewScheidecker in JS you can still observe the content of the ArrayBuffer, so precision matters in that you have to preserve some of the store ordering. That means that you can't e.g. vectorize a loop to execute it 4-wide when that would reorder stores. Whether wasm is the right place for this type of optimization is another question...

@pizlonator
Copy link
Contributor

@jfbastien I wouldn't say "can't". Precision doesn't preclude optimizations, it just makes them more expensive to run. In the case of vectorization, you could do it if you version the loop - something that a vectorizer will probably do anyway.

@lukewagner
Copy link
Member Author

Ok, I think the resolution here is Error.stack isn't the right place for a wasm engine to reliably expose bytecode offsets (that'd be some new API which could be more carefully defined) nor is wasm in a position to attempt to standardize JS's stack string format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants