[wasm] Optimize out redundant null checks in the jiterpreter #81811

kg · 2023-02-08T02:53:34Z

This PR optimizes out some redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller.

Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch.

A configuration flag in the source is added that can be used to generate runtime assertions when a null check is eliminated - you can use this if trying to diagnose a crash that appears related to this optimization. You can also disable the optimization entirely with a runtime flag or by changing options-def.h.

This PR also fixes a bug where in some cases after we exited a trace, the interpreter would run the last opcode of the trace.

The changes in this PR appear to recover most of the performance lost in parts of the CreateAddAndClear benchmark suite. In the long run most or all of this optimization will move up into the interpreter so that null check elimination can happen during tiered code generation.

ghost · 2023-02-08T02:53:42Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This is an experimental attempt to optimize out redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller.

Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch.

Creating a draft PR to see what fails on CI.

Author:	kg
Assignees:	-
Labels:	`NO-MERGE`, `arch-wasm`
Milestone:	-

kg · 2023-02-08T18:49:49Z

Current measurements for the LinkedList regression and its neighbors:

Method	null check opt	baseline jiterp	no jiterp
List	31.34 us	30.95 us	38.43 us
LinkedList	111.44 us	116.48 us	109.77 us
HashSet	208.64 us	206.25 us	314.94 us
Dictionary	225.47 us	221.72 us	307.41 us
SortedList	2,658.94 us	2,718.35 us	3,170.98 us
SortedSet	2,593.55 us	2,597.49 us	3,091.02 us
SortedDictionary	2,959.92 us	3,021.93 us	3,337.97 us
ConcurrentDictionary	702.62 us	699.36 us	691.09 us
Stack	33.38 us	37.40 us	46.29 us
ConcurrentStack	99.11 us	96.33 us	101.71 us
Queue	59.19 us	67.42 us	66.08 us
ConcurrentQueue	100.88 us	100.14 us	116.77 us
ConcurrentBag	170.32 us	153.49 us	186.33 us
ImmutableArray	586.99 us	588.38 us	606.31 us
ImmutableList	1,784.92 us	1,627.88 us	1,780.60 us
ImmutableDictionary	3,971.08 us	3,931.74 us	4,539.17 us
ImmutableHashSet	4,081.95 us	4,040.37 us	4,349.96 us
ImmutableSortedDictionary	5,106.88 us	5,099.18 us	5,866.34 us
ImmutableSortedSet	5,140.73 us	4,812.98 us	5,433.37 us
ImmutableStack	38.02 us	34.87 us	38.38 us
ImmutableQueue	85.92 us	81.49 us	83.10 us
Array	17.16 us	16.86 us	18.64 us
Span	17.17 us	16.74 us	12.18 us
ICollection	44.58 us	46.65 us	61.00 us
IDictionary	224.83 us	226.64 us	382.38 us

So this optimization appears to get rid of most of the regression caused by using the jiterpreter for the add method. The rest of the problem may just be the overhead involved in entering traces, which suggests we might want to raise the minimum trace length across the board.

kg · 2023-02-09T00:44:58Z

Sample optimized code (reformatted wabt disassembly) for the problem method in the CreateAddAndClear benchmark, alongside the opcode stream the jiterpreter is operating on:

void System.Collections.Generic.LinkedList`1<string>:InternalInsertNodeBefore (System.Collections.Generic.LinkedListNode`1<string>,System.Collections.Generic.LinkedListNode`1<string>)
stfld.o
ldfld.o
stfld.o
ldfld.o
stfld.o
stfld.o
ldfld.i4
add1.i4
stfld.i4
ldfld.i4
add1.i4
stfld.i4

int InternalInsertNodeBefore_0(int a, int *pLocals) {
  int *cknull_ptr;
  if (!i_stfld_o(pLocals, 12, 16, 8))
    return 6;

  cknull_ptr = pLocals[2];
  if (!cknull_ptr)
    return 14;

  pLocals[8] = cknull_ptr[16];
  if (!i_stfld_o(pLocals, 16, 16, 32))
    return 22;

  pLocals[8] = cknull_ptr[16];
  if (!i_stfld_o(pLocals, 12, 32, 16))
    return 38;

  i_stfld_o(pLocals, 16, 8, 16);
  cknull_ptr = pLocals[0];
  if (!cknull_ptr)
    return 54;

  pLocals[8] = cknull_ptr[20];
  pLocals[8] = pLocals[8] + 1;
  cknull_ptr[20] = pLocals[8];
  pLocals[8] = cknull_ptr[16];
  pLocals[8] = pLocals[8] + 1;
  cknull_ptr[16] = pLocals[8];

  return 98;
}

You can see a few of the direct field stores don't have a null check, and the return value of stfld_o isn't checked for one of the stores since we know it can't fail.

…ll check was correct to eliminate

Fix bug where sometimes last opcode in a trace would execute twice

kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture labels Feb 8, 2023

dotnet-issue-labeler bot added the area-Codegen-Interpreter-mono label Feb 8, 2023

ghost assigned kg Feb 8, 2023

kg force-pushed the wasm-jiterpreter-null-check-optimization branch from 1eb67de to 3dbed8d Compare February 8, 2023 03:58

This was referenced Feb 8, 2023

Roslyn source generator crash on mono/linux/arm64 #81123

Closed

[wasm][debugger] Multiple tests failing on rolling builds on Windows #81792

Closed

This was referenced Feb 8, 2023

Infra improvements for Helix #68176

Closed

Methodical_others test JIT/Methodical/Coverage/copy_prop_byref_to_native_int crashing #69832

Open

kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 9, 2023

kg marked this pull request as ready for review February 9, 2023 00:45

kg requested review from lewing, pavelsavara, vargaz and lambdageek as code owners February 9, 2023 00:45

kg added 8 commits February 9, 2023 07:18

Checkpoint: Duplicate null check optimization

5f8a55d

Checkpoint

348ebbd

Remove eslint disable comment

8ad7729

Be more precise about address-taken invalidation

e23e8c8

Checkpoint

4ea8c11

Add option for null check elimination and report it in stats

8744143

Add null check validation mode that verifies whether an eliminated nu…

9d3e465

…ll check was correct to eliminate

Turn null check validation back off

3d79e28

Fix bug where sometimes last opcode in a trace would execute twice

kg force-pushed the wasm-jiterpreter-null-check-optimization branch from 3106778 to 3d79e28 Compare February 9, 2023 15:18

kg requested a review from BrzVlad February 9, 2023 19:29

vargaz approved these changes Feb 9, 2023

View reviewed changes

kg merged commit 2b70123 into dotnet:main Feb 9, 2023

ghost locked as resolved and limited conversation to collaborators Mar 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm] Optimize out redundant null checks in the jiterpreter #81811

[wasm] Optimize out redundant null checks in the jiterpreter #81811

kg commented Feb 8, 2023 •

edited

Loading

ghost commented Feb 8, 2023

kg commented Feb 8, 2023

kg commented Feb 9, 2023

[wasm] Optimize out redundant null checks in the jiterpreter #81811

[wasm] Optimize out redundant null checks in the jiterpreter #81811

Conversation

kg commented Feb 8, 2023 • edited Loading

ghost commented Feb 8, 2023

kg commented Feb 8, 2023

kg commented Feb 9, 2023

kg commented Feb 8, 2023 •

edited

Loading