[Parser] Enable the new text parser by default #6371

tlively · 2024-03-01T06:05:25Z

The new text parser is faster and more standards compliant than the old text
parser. Enable it by default in wasm-opt and update the tests to reflect the
slightly different results it produces. Besides following the spec, the new
parser differs from the old parser in that it:

Does not synthesize loop and try labels unnecessarily
Synthesizes different block names in some cases
Parses exports in a different order
Parses nops instead of empty blocks for empty control flow arms
Does not support parsing Poppy IR
Produces different error messages
Cannot parse pop except as the first instruction inside a catch

tlively · 2024-03-01T06:05:39Z

[Parser] Enable the new text parser by default #6371 👈
[Parser] Preserve try labels #6505
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlively and the rest of your teammates on Graphite

tlively · 2024-03-01T06:11:56Z

test/lit/debug/full.wat

@@ -20,33 +20,26 @@
  ;; NRML-NEXT:   (drop
  ;; NRML-NEXT:    (i32.const 1)
  ;; NRML-NEXT:   )
-  ;; NRML-NEXT:   ;;@ src.cpp:3:4


Looks like I misunderstood some of the source location stuff :)

Is it the case that a source location annotation applies to every following instruction in stack machine order until there's a new source location annotation? Do they continue applying beyond the beginnings and ends of control flow scopes?

I think you may be referring to the issue fixed in #5906, see details there.

Hmm, no, that PR seems to only affect the encoding of source map info, but I'm confused about how to properly parse source map info.

I think the description there answers your question, though, unless I've misunderstood it. That is I think an annotation should not apply to following instructions, as the example there shows. However an annotation should apply to nested children iirc.

This might be a bug in the current parser, but currently annotations do apply to the following instructions. For instance, try running the following module through wasm-opt:

(module (func $f ;;@ a:1:1 (nop) (nop) (nop) ) )

I get the following ouput:

(module (type $0 (func)) (func $f ;;@ a:1:1 (nop) ;;@ a:1:1 (nop) ;;@ a:1:1 (nop) ) )

Under the assumption that the old parser is working as intended, the new parser now matches that behavior as well. It's a little unclear to me whether that is the best behavior, though.

tlively · 2024-03-01T06:14:11Z

test/lit/multivalue.wast

@@ -356,10 +356,24 @@

 ;; Test multivalue control structures
 ;; CHECK:      (func $mv-return (type $0) (result i32 i64)
+ ;; CHECK-NEXT:  (local $scratch i32)


This return lowering can be fixed in IRBuilder.

tlively · 2024-03-05T22:39:23Z

@kripken, I think this is pretty much ready to go. Do you have opinions on how we should land this switch? Should we make the switch right away, or should we get into a place where we will be able to delete the old parser at the same time? If the former, should we have a --old-wat-parser flag for a bit to go back to using the old parser?

kripken · 2024-03-05T23:01:55Z

I don't have a specific idea in mind for how to land this. Either deleting the old parser at the same time, or not, both of those sound acceptable to me. The only critical things I can think of are:

Tests pass. (Looks like there's a lit test error atm, is that expected?)
Fuzzer passes. That depends on Identical function types not merged #3989 so that we can fuzz the text format again.
Coordination with users.
Me to review the >100 test file changes here. Before I start on that, why are there so many? I'd hope for 0 😄 but more realistically I'd expect a few in tests that check corner cases of parsing, but not >100...

tlively · 2024-03-05T23:47:22Z

I don't have a specific idea in mind for how to land this. Either deleting the old parser at the same time, or not, both of those sound acceptable to me. The only critical things I can think of are:

Tests pass. (Looks like there's a lit test error atm, is that expected?)

Just pushed a fix for this one :)

Fuzzer passes. That depends on Identical function types not merged #3989 so that we can fuzz the text format again.

Cool, I'll look into that.

Coordination with users.

I can post on the Discord binaryen channel about the switch and also check in with j2wasm.

Me to review the >100 test file changes here. Before I start on that, why are there so many? I'd hope for 0 😄 but more realistically I'd expect a few in tests that check corner cases of parsing, but not >100...

These are all due to differences in how try is parsed and how unreachable code is parsed. It turns out we have a lot of tests with unreachable code!

kripken · 2024-03-06T00:20:22Z

In what way is our parsing of try and unreachable code different? That would also be good to document in the changelog or elsewhere.

tlively · 2024-03-06T01:33:49Z

Check out the changed tests for examples, but in short we explicitly drop anything that was on the stack when we encounter an unreachable instruction, so we end up with more drops than the old parser. For try, we often emit a wrapper block with the try's label rather than attaching the label to the try itself. I could try making that behavior match the old parser better, but I don't think it's that important.

kripken · 2024-03-06T18:46:38Z

check.py

                    run_opt_test('split.wast')    # also that our optimizer doesn't break on it
-                    result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False, original_wast=wast)
+                    result_wast_file = shared.binary_format_check('split.wast', verify_final_result=False)


These changes look correct and NFC but perhaps they could land in a separate PR?

kripken · 2024-03-06T18:49:08Z

src/parser/wat-parser.cpp

+  PassOptions opts;
+  opts.validate = false;
+  opts.validateGlobally = false;
+  PassRunner runner(&wasm, opts);


Maybe a better way to get this outcome is to do setIsNested(true). Nested pass runners do not run validation and do not show up on various logging, as they are internal to a pass. (This isn't a pass, but the idea would be the same - it's an internal operation we don't consider as "atomic" in the sense of having valid input and output, and there is no point to log it.)

I see a thumbs up here but the code looks unchanged - is this in progress or did you prefer the current form for some reason?

Sorry, graphite keeps marking this ready for review, but it's definitely still a draft. I'll make this change before this is ready for real review, but I've been focusing on fixing fuzz bugs before coming back to this.

I still think this is worth doing.

kripken · 2024-03-06T18:55:19Z

test/lit/basic/polymorphic_stack.wast

  ;; CHECK-TEXT-NEXT:     (i32.const 0)
+  ;; CHECK-TEXT-NEXT:    )
+  ;; CHECK-TEXT-NEXT:    (br_if $x
+  ;; CHECK-TEXT-NEXT:     (unreachable)
  ;; CHECK-TEXT-NEXT:     (unreachable)


This change may have implications for our testing. Before, we were able to make a particular argument to this br_if be unreachable, and test specifically the proper handling of that, while after, it looks like all arguments become unreachable. My concern is that unreachability of the first argument may prevent testing of handling of only the second argument being unreachable (which is possible after optimizations).

There is also a minor downside to having less precise roundtripping here when debugging.

How difficult would it be to make the parser roundtrip more accurately here?

Consider these two instruction sequences:

i32.const 0 unreachable i32.add f32.const 0 unreachable i32.add

They are both valid, and we currently parse both just fine:

(drop (i32.const 0) ) (i32.add (unreachable) (unreachable) ) (drop (f32.const 0) ) (i32.add (unreachable) (unreachable) )

If we tried to match existing behavior with the first sequence, we would end up with this for the second sequence:

(i32.add (f32.const 0) (unreachable) )

Taking a look at our validator, it looks like this might be ok, actually? I don't know how consistent we are about letting children of unreachable nodes have any type.

(Also our validator will fail to reject (i32.add (unreachable) (f32.const 0)), but that's a separate issue.)

Hmm, I think that would error - the parent still requires the non-unreachable children make sense.

I agree this is an issue for the unfolded form, but it seems that it has a downside for the folded form. And we use that folded form in order to test our IR, so being able to express less there is concerning.

After talking offline, the plan is to try augmenting IRBuilder with some additional smarts around the expected types of things it pops. This will let the parser continue parsing instructions without having to worry about their nested relationships.

Today when an unreachable instruction is pushed, everything before it is automatically dropped. We will instead defer that automatic dropping behavior as far as possible. When popping the children for each newly-parsed instruction, we will attempt to pop existing instructions from the stack, even past an unreachable instruction. If we pop past an unreachable and encounter a mismatch between the expected and available type, we will undo the popping back to and including the unreachable, then fall back to the auto-drop behavior we have today. We will also auto-drop everything when finishing a scope.

I will implement this after getting fuzzing of text round-tripping working with the current implementation. I can probably extend subtype-exprs.h to pass a pointer to the child expression to noteSubtype and build the necessary functionality off of that. We could also reuse this utility in the validator.

test/lit/passes/global-effects.wast

tlively · 2024-04-17T05:33:53Z

The test failure is due to the fact that wasm-shell still uses the old parser while wasm-opt uses the new text parser, so import-after-func.fail.wast does not fail as expected when run under wasm-shell, but then it fails unexpectedly later when run with wasm-opt.

I'll try to fix it by getting wasm-shell able to run with the new text parser in a preliminary PR, since we want to do that anyway.

kripken · 2024-04-17T15:22:25Z

src/parser/wat-parser.cpp

+  PassOptions opts;
+  opts.validate = false;
+  opts.validateGlobally = false;
+  PassRunner runner(&wasm, opts);


I still think this is worth doing.

kripken · 2024-04-17T15:27:15Z

test/lit/debug/full.wat

@@ -46,7 +45,6 @@
  ;; FULL-NEXT:    (i32.const 2)
  ;; FULL-NEXT:   )
  ;; FULL-NEXT:  ) ;; end block block
-  ;; FULL-NEXT:  ;;@ src.cpp:1:2


Was this incorrect before?

Unclear what behavior is "correct," but we no longer propagate debug locations to the function epilogue automatically.

I would verify that no emscripten debug info tests fail with this change. If that's ok then sgtm.

The new text parser is faster and more standards compliant than the old text parser. Enable it by default in wasm-opt and update the tests to reflect the slightly different results it produces. Besides following the spec, the new parser differs from the old parser in that it: - Does not synthesize `loop` and `try` labels unnecessarily - Synthesizes different block names in some cases - Parses exports in a different order - Parses `nop`s instead of empty blocks for empty control flow arms - Does not support parsing Poppy IR - Produces different error messages - Cannot parse `pop` except as the first instruction inside a `catch`

tlively · 2024-04-25T04:09:27Z

I am marking this as non-draft for real now. @kripken, PTAL!

kripken · 2024-04-25T15:55:51Z

test/passes/remove-unused-names_merge-blocks_all-features.txt

  )
+  (unreachable)
+  (return)


This looks like a loss of the ability to test such a nested block in a return - is this fixable?

The surrounding function has no return values, so the return has no business having children. In the old parser we allowed this anyway if the children happened to be unreachable, but I actually think the new behavior makes more sense. Perhaps we should add a check to our validator that only returns in functions that have return values can have children.

Thanks, I missed it had no returns from the function. This does sound better then, I agree. Validator improvement makes sense too.

kripken · 2024-04-25T16:29:57Z

test/lit/passes/type-merging.wast

-  ;; CHECK-NEXT:    (block $l (result (ref $A))
-  ;; CHECK-NEXT:     (br_on_non_null $l
+  ;; CHECK-NEXT:  (block $label (result (ref $B))
+  ;; CHECK-NEXT:   (block (result (ref $B))


It looks like now we generate two blocks here instead of one. This is probably fine as optimizations will remove the extra one, but just wanted to check this is expected? The thing I'd worry about most here is if extra blocks are added in each roundtrip here.

The inner block is the original function body block (which normally would not be printed) and the outer one is a wrapper block for receiving the branch. It looks like the old parser combines these into a single block, which is slightly nicer. There's no risk of roundtrips adding more blocks here since the branches no longer target the function scope.

I see, thanks. Would it be hard to reuse the existing function body block in the new parser, as a minor optimization here? Not that important but it would be nice.

I can look into it 👍

test/lit/passes/catch-pop-fixup-eh-old.wast

kripken

Great!

tlively mentioned this pull request Mar 1, 2024

[Parser] Support prologue and epilogue sourcemap annotations #6370

Merged

tlively commented Mar 1, 2024

View reviewed changes

tlively marked this pull request as ready for review March 4, 2024 23:24

tlively force-pushed the parser-on branch from 1849a39 to fc8506d Compare March 4, 2024 23:24

Base automatically changed from parser-sourcemap-prolog-epilog to main March 4, 2024 23:50

tlively changed the base branch from main to parser-propagate-debuglocs March 5, 2024 03:06

tlively force-pushed the parser-on branch from fc8506d to 361db99 Compare March 5, 2024 03:06

tlively mentioned this pull request Mar 5, 2024

[Parser] Propagate debug locations like the old parser #6377

Merged

tlively force-pushed the parser-on branch from 361db99 to c263ea1 Compare March 5, 2024 20:01

Base automatically changed from parser-propagate-debuglocs to main March 5, 2024 20:27

tlively force-pushed the parser-on branch from c263ea1 to caa5bb1 Compare March 5, 2024 21:28

tlively changed the base branch from main to irbuilder-better-return March 5, 2024 21:30

tlively force-pushed the parser-on branch from caa5bb1 to 7250bdd Compare March 5, 2024 21:30

tlively mentioned this pull request Mar 5, 2024

[Parser] Improve parsed IR for multivalue returns #6378

Merged

Base automatically changed from irbuilder-better-return to main March 5, 2024 22:37

tlively force-pushed the parser-on branch from 7250bdd to 6abb4ef Compare March 5, 2024 23:48

tlively changed the base branch from main to print-offset March 6, 2024 02:30

tlively force-pushed the parser-on branch from 6abb4ef to 759493c Compare March 6, 2024 02:30

tlively mentioned this pull request Mar 6, 2024

Print '(offset ...)` in data and element segments #6379

Merged

Base automatically changed from print-offset to main March 6, 2024 18:24

kripken reviewed Mar 6, 2024

View reviewed changes

tlively changed the base branch from main to irbuilder-tuple-arities March 7, 2024 03:00

tlively force-pushed the parser-on branch from 759493c to d8423fb Compare March 7, 2024 03:00

tlively mentioned this pull request Mar 7, 2024

Fix EH fuzz bugs #6381

Merged

tlively force-pushed the parser-on branch from c258c9a to feb1d9d Compare April 12, 2024 03:42

tlively force-pushed the child-typer branch from 1587839 to 37032a2 Compare April 16, 2024 16:46

Base automatically changed from child-typer to main April 16, 2024 17:43

tlively force-pushed the parser-on branch from feb1d9d to 763e626 Compare April 16, 2024 21:59

tlively changed the base branch from main to parser-match-block-names April 16, 2024 21:59

tlively requested a review from kripken April 16, 2024 21:59

tlively mentioned this pull request Apr 16, 2024

[Parser] Match legacy parser block naming #6504

Merged

kripken reviewed Apr 16, 2024

View reviewed changes

test/lit/passes/global-effects.wast Show resolved Hide resolved

Base automatically changed from parser-match-block-names to main April 16, 2024 23:27

tlively changed the base branch from main to parser-try-labels April 17, 2024 03:15

tlively force-pushed the parser-on branch from 763e626 to 1a32550 Compare April 17, 2024 03:15

tlively requested a review from kripken April 17, 2024 03:15

tlively mentioned this pull request Apr 17, 2024

[Parser] Preserve try labels #6505

Merged

kripken reviewed Apr 17, 2024

View reviewed changes

Base automatically changed from parser-try-labels to main April 17, 2024 20:49

tlively marked this pull request as draft April 17, 2024 20:58

tlively marked this pull request as ready for review April 25, 2024 04:07

tlively force-pushed the parser-on branch from 1a32550 to feff548 Compare April 25, 2024 04:07

tlively changed the title ~~[Draft][Parser] Enable the new text parser by default~~ [Parser] Enable the new text parser by default Apr 25, 2024

kripken reviewed Apr 25, 2024

View reviewed changes

test/lit/passes/catch-pop-fixup-eh-old.wast Show resolved Hide resolved

kripken approved these changes Apr 25, 2024

View reviewed changes

tlively merged commit 956d2d8 into main Apr 25, 2024
26 checks passed

tlively deleted the parser-on branch April 25, 2024 21:55

tlively mentioned this pull request May 20, 2024

[Parser] Issues to resolve before switching to the new wat parser #6208

Closed

18 tasks

gkdn mentioned this pull request Aug 31, 2024

stringconsts gkdn/binaryen#1

Closed

[Parser] Enable the new text parser by default #6371

[Parser] Enable the new text parser by default #6371

Conversation

tlively commented Mar 1, 2024 • edited Loading

tlively commented Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively Mar 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively commented Mar 5, 2024

kripken commented Mar 5, 2024

tlively commented Mar 5, 2024

kripken commented Mar 6, 2024

tlively commented Mar 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively commented Apr 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlively commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken left a comment

Choose a reason for hiding this comment

tlively commented Mar 1, 2024 •

edited

Loading

tlively commented Mar 1, 2024 •

edited

Loading

tlively Mar 4, 2024 •

edited

Loading