-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parser] Enable the new text parser by default #6371
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
test/lit/debug/full.wat
Outdated
@@ -20,33 +20,26 @@ | |||
;; NRML-NEXT: (drop | |||
;; NRML-NEXT: (i32.const 1) | |||
;; NRML-NEXT: ) | |||
;; NRML-NEXT: ;;@ src.cpp:3:4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I misunderstood some of the source location stuff :)
Is it the case that a source location annotation applies to every following instruction in stack machine order until there's a new source location annotation? Do they continue applying beyond the beginnings and ends of control flow scopes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may be referring to the issue fixed in #5906, see details there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, that PR seems to only affect the encoding of source map info, but I'm confused about how to properly parse source map info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the description there answers your question, though, unless I've misunderstood it. That is I think an annotation should not apply to following instructions, as the example there shows. However an annotation should apply to nested children iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bug in the current parser, but currently annotations do apply to the following instructions. For instance, try running the following module through wasm-opt
:
(module
(func $f
;;@ a:1:1
(nop)
(nop)
(nop)
)
)
I get the following ouput:
(module
(type $0 (func))
(func $f
;;@ a:1:1
(nop)
;;@ a:1:1
(nop)
;;@ a:1:1
(nop)
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the assumption that the old parser is working as intended, the new parser now matches that behavior as well. It's a little unclear to me whether that is the best behavior, though.
test/lit/multivalue.wast
Outdated
@@ -356,10 +356,24 @@ | |||
|
|||
;; Test multivalue control structures | |||
;; CHECK: (func $mv-return (type $0) (result i32 i64) | |||
;; CHECK-NEXT: (local $scratch i32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return lowering can be fixed in IRBuilder.
@kripken, I think this is pretty much ready to go. Do you have opinions on how we should land this switch? Should we make the switch right away, or should we get into a place where we will be able to delete the old parser at the same time? If the former, should we have a |
I don't have a specific idea in mind for how to land this. Either deleting the old parser at the same time, or not, both of those sound acceptable to me. The only critical things I can think of are:
|
Just pushed a fix for this one :)
Cool, I'll look into that.
I can post on the Discord binaryen channel about the switch and also check in with j2wasm.
These are all due to differences in how |
In what way is our parsing of try and unreachable code different? That would also be good to document in the changelog or elsewhere. |
Check out the changed tests for examples, but in short we explicitly drop anything that was on the stack when we encounter an unreachable instruction, so we end up with more drops than the old parser. For |
check.py
Outdated
run_opt_test('split.wast') # also that our optimizer doesn't break on it | ||
result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False, original_wast=wast) | ||
result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look correct and NFC but perhaps they could land in a separate PR?
src/parser/wat-parser.cpp
Outdated
PassOptions opts; | ||
opts.validate = false; | ||
opts.validateGlobally = false; | ||
PassRunner runner(&wasm, opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better way to get this outcome is to do setIsNested(true)
. Nested pass runners do not run validation and do not show up on various logging, as they are internal to a pass. (This isn't a pass, but the idea would be the same - it's an internal operation we don't consider as "atomic" in the sense of having valid input and output, and there is no point to log it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a thumbs up here but the code looks unchanged - is this in progress or did you prefer the current form for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, graphite keeps marking this ready for review, but it's definitely still a draft. I'll make this change before this is ready for real review, but I've been focusing on fixing fuzz bugs before coming back to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this is worth doing.
;; CHECK-TEXT-NEXT: (i32.const 0) | ||
;; CHECK-TEXT-NEXT: ) | ||
;; CHECK-TEXT-NEXT: (br_if $x | ||
;; CHECK-TEXT-NEXT: (unreachable) | ||
;; CHECK-TEXT-NEXT: (unreachable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change may have implications for our testing. Before, we were able to make a particular argument to this br_if
be unreachable, and test specifically the proper handling of that, while after, it looks like all arguments become unreachable. My concern is that unreachability of the first argument may prevent testing of handling of only the second argument being unreachable (which is possible after optimizations).
There is also a minor downside to having less precise roundtripping here when debugging.
How difficult would it be to make the parser roundtrip more accurately here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider these two instruction sequences:
i32.const 0
unreachable
i32.add
f32.const 0
unreachable
i32.add
They are both valid, and we currently parse both just fine:
(drop
(i32.const 0)
)
(i32.add
(unreachable)
(unreachable)
)
(drop
(f32.const 0)
)
(i32.add
(unreachable)
(unreachable)
)
If we tried to match existing behavior with the first sequence, we would end up with this for the second sequence:
(i32.add
(f32.const 0)
(unreachable)
)
Taking a look at our validator, it looks like this might be ok, actually? I don't know how consistent we are about letting children of unreachable nodes have any type.
(Also our validator will fail to reject (i32.add (unreachable) (f32.const 0))
, but that's a separate issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that would error - the parent still requires the non-unreachable children make sense.
I agree this is an issue for the unfolded form, but it seems that it has a downside for the folded form. And we use that folded form in order to test our IR, so being able to express less there is concerning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking offline, the plan is to try augmenting IRBuilder
with some additional smarts around the expected types of things it pops. This will let the parser continue parsing instructions without having to worry about their nested relationships.
Today when an unreachable instruction is pushed, everything before it is automatically dropped. We will instead defer that automatic dropping behavior as far as possible. When popping the children for each newly-parsed instruction, we will attempt to pop existing instructions from the stack, even past an unreachable instruction. If we pop past an unreachable and encounter a mismatch between the expected and available type, we will undo the popping back to and including the unreachable, then fall back to the auto-drop behavior we have today. We will also auto-drop everything when finishing a scope.
I will implement this after getting fuzzing of text round-tripping working with the current implementation. I can probably extend subtype-exprs.h to pass a pointer to the child expression to noteSubtype
and build the necessary functionality off of that. We could also reuse this utility in the validator.
The test failure is due to the fact that wasm-shell still uses the old parser while wasm-opt uses the new text parser, so import-after-func.fail.wast does not fail as expected when run under wasm-shell, but then it fails unexpectedly later when run with wasm-opt. I'll try to fix it by getting wasm-shell able to run with the new text parser in a preliminary PR, since we want to do that anyway. |
src/parser/wat-parser.cpp
Outdated
PassOptions opts; | ||
opts.validate = false; | ||
opts.validateGlobally = false; | ||
PassRunner runner(&wasm, opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this is worth doing.
@@ -46,7 +45,6 @@ | |||
;; FULL-NEXT: (i32.const 2) | |||
;; FULL-NEXT: ) | |||
;; FULL-NEXT: ) ;; end block block | |||
;; FULL-NEXT: ;;@ src.cpp:1:2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this incorrect before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear what behavior is "correct," but we no longer propagate debug locations to the function epilogue automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would verify that no emscripten debug info tests fail with this change. If that's ok then sgtm.
The new text parser is faster and more standards compliant than the old text parser. Enable it by default in wasm-opt and update the tests to reflect the slightly different results it produces. Besides following the spec, the new parser differs from the old parser in that it: - Does not synthesize `loop` and `try` labels unnecessarily - Synthesizes different block names in some cases - Parses exports in a different order - Parses `nop`s instead of empty blocks for empty control flow arms - Does not support parsing Poppy IR - Produces different error messages - Cannot parse `pop` except as the first instruction inside a `catch`
I am marking this as non-draft for real now. @kripken, PTAL! |
) | ||
(unreachable) | ||
(return) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a loss of the ability to test such a nested block in a return - is this fixable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The surrounding function has no return values, so the return
has no business having children. In the old parser we allowed this anyway if the children happened to be unreachable, but I actually think the new behavior makes more sense. Perhaps we should add a check to our validator that only returns in functions that have return values can have children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed it had no returns from the function. This does sound better then, I agree. Validator improvement makes sense too.
;; CHECK-NEXT: (block $l (result (ref $A)) | ||
;; CHECK-NEXT: (br_on_non_null $l | ||
;; CHECK-NEXT: (block $label (result (ref $B)) | ||
;; CHECK-NEXT: (block (result (ref $B)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like now we generate two blocks here instead of one. This is probably fine as optimizations will remove the extra one, but just wanted to check this is expected? The thing I'd worry about most here is if extra blocks are added in each roundtrip here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner block is the original function body block (which normally would not be printed) and the outer one is a wrapper block for receiving the branch. It looks like the old parser combines these into a single block, which is slightly nicer. There's no risk of roundtrips adding more blocks here since the branches no longer target the function scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. Would it be hard to reuse the existing function body block in the new parser, as a minor optimization here? Not that important but it would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:
loop
andtry
labels unnecessarilynop
s instead of empty blocks for empty control flow armspop
except as the first instruction inside acatch