Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parser] Enable the new text parser by default #6371

Merged
merged 1 commit into from
Apr 25, 2024
Merged

[Parser] Enable the new text parser by default #6371

merged 1 commit into from
Apr 25, 2024

Conversation

tlively
Copy link
Member

@tlively tlively commented Mar 1, 2024

The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:

  • Does not synthesize loop and try labels unnecessarily
  • Synthesizes different block names in some cases
  • Parses exports in a different order
  • Parses nops instead of empty blocks for empty control flow arms
  • Does not support parsing Poppy IR
  • Produces different error messages
  • Cannot parse pop except as the first instruction inside a catch

@tlively
Copy link
Member Author

tlively commented Mar 1, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlively and the rest of your teammates on Graphite Graphite

@@ -20,33 +20,26 @@
;; NRML-NEXT: (drop
;; NRML-NEXT: (i32.const 1)
;; NRML-NEXT: )
;; NRML-NEXT: ;;@ src.cpp:3:4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I misunderstood some of the source location stuff :)

Is it the case that a source location annotation applies to every following instruction in stack machine order until there's a new source location annotation? Do they continue applying beyond the beginnings and ends of control flow scopes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may be referring to the issue fixed in #5906, see details there.

Copy link
Member Author

@tlively tlively Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, that PR seems to only affect the encoding of source map info, but I'm confused about how to properly parse source map info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description there answers your question, though, unless I've misunderstood it. That is I think an annotation should not apply to following instructions, as the example there shows. However an annotation should apply to nested children iirc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bug in the current parser, but currently annotations do apply to the following instructions. For instance, try running the following module through wasm-opt:

(module
   (func $f
      ;;@ a:1:1
      (nop)
      (nop)
      (nop)
   )
)

I get the following ouput:

(module
 (type $0 (func))
 (func $f
  ;;@ a:1:1
  (nop)
  ;;@ a:1:1
  (nop)
  ;;@ a:1:1
  (nop)
 )
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the assumption that the old parser is working as intended, the new parser now matches that behavior as well. It's a little unclear to me whether that is the best behavior, though.

@@ -356,10 +356,24 @@

;; Test multivalue control structures
;; CHECK: (func $mv-return (type $0) (result i32 i64)
;; CHECK-NEXT: (local $scratch i32)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return lowering can be fixed in IRBuilder.

@tlively tlively marked this pull request as ready for review March 4, 2024 23:24
Base automatically changed from parser-sourcemap-prolog-epilog to main March 4, 2024 23:50
@tlively tlively changed the base branch from main to parser-propagate-debuglocs March 5, 2024 03:06
Base automatically changed from parser-propagate-debuglocs to main March 5, 2024 20:27
@tlively tlively changed the base branch from main to irbuilder-better-return March 5, 2024 21:30
Base automatically changed from irbuilder-better-return to main March 5, 2024 22:37
@tlively
Copy link
Member Author

tlively commented Mar 5, 2024

@kripken, I think this is pretty much ready to go. Do you have opinions on how we should land this switch? Should we make the switch right away, or should we get into a place where we will be able to delete the old parser at the same time? If the former, should we have a --old-wat-parser flag for a bit to go back to using the old parser?

@kripken
Copy link
Member

kripken commented Mar 5, 2024

I don't have a specific idea in mind for how to land this. Either deleting the old parser at the same time, or not, both of those sound acceptable to me. The only critical things I can think of are:

  1. Tests pass. (Looks like there's a lit test error atm, is that expected?)
  2. Fuzzer passes. That depends on Identical function types not merged #3989 so that we can fuzz the text format again.
  3. Coordination with users.
  4. Me to review the >100 test file changes here. Before I start on that, why are there so many? I'd hope for 0 😄 but more realistically I'd expect a few in tests that check corner cases of parsing, but not >100...

@tlively
Copy link
Member Author

tlively commented Mar 5, 2024

I don't have a specific idea in mind for how to land this. Either deleting the old parser at the same time, or not, both of those sound acceptable to me. The only critical things I can think of are:

  1. Tests pass. (Looks like there's a lit test error atm, is that expected?)

Just pushed a fix for this one :)

  1. Fuzzer passes. That depends on Identical function types not merged #3989 so that we can fuzz the text format again.

Cool, I'll look into that.

  1. Coordination with users.

I can post on the Discord binaryen channel about the switch and also check in with j2wasm.

  1. Me to review the >100 test file changes here. Before I start on that, why are there so many? I'd hope for 0 😄 but more realistically I'd expect a few in tests that check corner cases of parsing, but not >100...

These are all due to differences in how try is parsed and how unreachable code is parsed. It turns out we have a lot of tests with unreachable code!

@kripken
Copy link
Member

kripken commented Mar 6, 2024

In what way is our parsing of try and unreachable code different? That would also be good to document in the changelog or elsewhere.

@tlively
Copy link
Member Author

tlively commented Mar 6, 2024

Check out the changed tests for examples, but in short we explicitly drop anything that was on the stack when we encounter an unreachable instruction, so we end up with more drops than the old parser. For try, we often emit a wrapper block with the try's label rather than attaching the label to the try itself. I could try making that behavior match the old parser better, but I don't think it's that important.

@tlively tlively changed the base branch from main to print-offset March 6, 2024 02:30
Base automatically changed from print-offset to main March 6, 2024 18:24
check.py Outdated
run_opt_test('split.wast') # also that our optimizer doesn't break on it
result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False, original_wast=wast)
result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look correct and NFC but perhaps they could land in a separate PR?

PassOptions opts;
opts.validate = false;
opts.validateGlobally = false;
PassRunner runner(&wasm, opts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better way to get this outcome is to do setIsNested(true). Nested pass runners do not run validation and do not show up on various logging, as they are internal to a pass. (This isn't a pass, but the idea would be the same - it's an internal operation we don't consider as "atomic" in the sense of having valid input and output, and there is no point to log it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a thumbs up here but the code looks unchanged - is this in progress or did you prefer the current form for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, graphite keeps marking this ready for review, but it's definitely still a draft. I'll make this change before this is ready for real review, but I've been focusing on fixing fuzz bugs before coming back to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is worth doing.

;; CHECK-TEXT-NEXT: (i32.const 0)
;; CHECK-TEXT-NEXT: )
;; CHECK-TEXT-NEXT: (br_if $x
;; CHECK-TEXT-NEXT: (unreachable)
;; CHECK-TEXT-NEXT: (unreachable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may have implications for our testing. Before, we were able to make a particular argument to this br_if be unreachable, and test specifically the proper handling of that, while after, it looks like all arguments become unreachable. My concern is that unreachability of the first argument may prevent testing of handling of only the second argument being unreachable (which is possible after optimizations).

There is also a minor downside to having less precise roundtripping here when debugging.

How difficult would it be to make the parser roundtrip more accurately here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider these two instruction sequences:

i32.const 0
unreachable
i32.add

f32.const 0
unreachable
i32.add

They are both valid, and we currently parse both just fine:

(drop
 (i32.const 0)
)
(i32.add
 (unreachable)
 (unreachable)
)

(drop
 (f32.const 0)
)
(i32.add
 (unreachable)
 (unreachable)
)

If we tried to match existing behavior with the first sequence, we would end up with this for the second sequence:

(i32.add
 (f32.const 0)
 (unreachable)
)

Taking a look at our validator, it looks like this might be ok, actually? I don't know how consistent we are about letting children of unreachable nodes have any type.

(Also our validator will fail to reject (i32.add (unreachable) (f32.const 0)), but that's a separate issue.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that would error - the parent still requires the non-unreachable children make sense.

I agree this is an issue for the unfolded form, but it seems that it has a downside for the folded form. And we use that folded form in order to test our IR, so being able to express less there is concerning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking offline, the plan is to try augmenting IRBuilder with some additional smarts around the expected types of things it pops. This will let the parser continue parsing instructions without having to worry about their nested relationships.

Today when an unreachable instruction is pushed, everything before it is automatically dropped. We will instead defer that automatic dropping behavior as far as possible. When popping the children for each newly-parsed instruction, we will attempt to pop existing instructions from the stack, even past an unreachable instruction. If we pop past an unreachable and encounter a mismatch between the expected and available type, we will undo the popping back to and including the unreachable, then fall back to the auto-drop behavior we have today. We will also auto-drop everything when finishing a scope.

I will implement this after getting fuzzing of text round-tripping working with the current implementation. I can probably extend subtype-exprs.h to pass a pointer to the child expression to noteSubtype and build the necessary functionality off of that. We could also reuse this utility in the validator.

@tlively tlively changed the base branch from main to irbuilder-tuple-arities March 7, 2024 03:00
@tlively tlively mentioned this pull request Mar 7, 2024
Base automatically changed from child-typer to main April 16, 2024 17:43
@tlively tlively changed the base branch from main to parser-match-block-names April 16, 2024 21:59
@tlively tlively requested a review from kripken April 16, 2024 21:59
Base automatically changed from parser-match-block-names to main April 16, 2024 23:27
@tlively tlively changed the base branch from main to parser-try-labels April 17, 2024 03:15
@tlively tlively requested a review from kripken April 17, 2024 03:15
@tlively
Copy link
Member Author

tlively commented Apr 17, 2024

The test failure is due to the fact that wasm-shell still uses the old parser while wasm-opt uses the new text parser, so import-after-func.fail.wast does not fail as expected when run under wasm-shell, but then it fails unexpectedly later when run with wasm-opt.

I'll try to fix it by getting wasm-shell able to run with the new text parser in a preliminary PR, since we want to do that anyway.

PassOptions opts;
opts.validate = false;
opts.validateGlobally = false;
PassRunner runner(&wasm, opts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is worth doing.

@@ -46,7 +45,6 @@
;; FULL-NEXT: (i32.const 2)
;; FULL-NEXT: )
;; FULL-NEXT: ) ;; end block block
;; FULL-NEXT: ;;@ src.cpp:1:2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this incorrect before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what behavior is "correct," but we no longer propagate debug locations to the function epilogue automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would verify that no emscripten debug info tests fail with this change. If that's ok then sgtm.

Base automatically changed from parser-try-labels to main April 17, 2024 20:49
@tlively tlively marked this pull request as draft April 17, 2024 20:58
The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:

 - Does not synthesize `loop` and `try` labels unnecessarily
 - Synthesizes different block names in some cases
 - Parses exports in a different order
 - Parses `nop`s instead of empty blocks for empty control flow arms
 - Does not support parsing Poppy IR
 - Produces different error messages
 - Cannot parse `pop` except as the first instruction inside a `catch`
@tlively tlively marked this pull request as ready for review April 25, 2024 04:07
@tlively tlively changed the title [Draft][Parser] Enable the new text parser by default [Parser] Enable the new text parser by default Apr 25, 2024
@tlively
Copy link
Member Author

tlively commented Apr 25, 2024

I am marking this as non-draft for real now. @kripken, PTAL!

)
(unreachable)
(return)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a loss of the ability to test such a nested block in a return - is this fixable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surrounding function has no return values, so the return has no business having children. In the old parser we allowed this anyway if the children happened to be unreachable, but I actually think the new behavior makes more sense. Perhaps we should add a check to our validator that only returns in functions that have return values can have children.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed it had no returns from the function. This does sound better then, I agree. Validator improvement makes sense too.

;; CHECK-NEXT: (block $l (result (ref $A))
;; CHECK-NEXT: (br_on_non_null $l
;; CHECK-NEXT: (block $label (result (ref $B))
;; CHECK-NEXT: (block (result (ref $B))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like now we generate two blocks here instead of one. This is probably fine as optimizations will remove the extra one, but just wanted to check this is expected? The thing I'd worry about most here is if extra blocks are added in each roundtrip here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner block is the original function body block (which normally would not be printed) and the outer one is a wrapper block for receiving the branch. It looks like the old parser combines these into a single block, which is slightly nicer. There's no risk of roundtrips adding more blocks here since the branches no longer target the function scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Would it be hard to reuse the existing function body block in the new parser, as a minor optimization here? Not that important but it would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into it 👍

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@tlively tlively merged commit 956d2d8 into main Apr 25, 2024
26 checks passed
@tlively tlively deleted the parser-on branch April 25, 2024 21:55
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants