Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Optimize out redundant null checks in the jiterpreter #81811

Merged
merged 8 commits into from
Feb 9, 2023

Conversation

kg
Copy link
Member

@kg kg commented Feb 8, 2023

This PR optimizes out some redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller.

Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch.

A configuration flag in the source is added that can be used to generate runtime assertions when a null check is eliminated - you can use this if trying to diagnose a crash that appears related to this optimization. You can also disable the optimization entirely with a runtime flag or by changing options-def.h.

This PR also fixes a bug where in some cases after we exited a trace, the interpreter would run the last opcode of the trace.

The changes in this PR appear to recover most of the performance lost in parts of the CreateAddAndClear benchmark suite. In the long run most or all of this optimization will move up into the interpreter so that null check elimination can happen during tiered code generation.

@kg kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture labels Feb 8, 2023
@ghost ghost assigned kg Feb 8, 2023
@ghost
Copy link

ghost commented Feb 8, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This is an experimental attempt to optimize out redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller.

Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch.

Creating a draft PR to see what fails on CI.

Author: kg
Assignees: -
Labels:

NO-MERGE, arch-wasm

Milestone: -

@kg
Copy link
Member Author

kg commented Feb 8, 2023

Current measurements for the LinkedList regression and its neighbors:

Method null check opt baseline jiterp no jiterp
List 31.34 us 30.95 us 38.43 us
LinkedList 111.44 us 116.48 us 109.77 us
HashSet 208.64 us 206.25 us 314.94 us
Dictionary 225.47 us 221.72 us 307.41 us
SortedList 2,658.94 us 2,718.35 us 3,170.98 us
SortedSet 2,593.55 us 2,597.49 us 3,091.02 us
SortedDictionary 2,959.92 us 3,021.93 us 3,337.97 us
ConcurrentDictionary 702.62 us 699.36 us 691.09 us
Stack 33.38 us 37.40 us 46.29 us
ConcurrentStack 99.11 us 96.33 us 101.71 us
Queue 59.19 us 67.42 us 66.08 us
ConcurrentQueue 100.88 us 100.14 us 116.77 us
ConcurrentBag 170.32 us 153.49 us 186.33 us
ImmutableArray 586.99 us 588.38 us 606.31 us
ImmutableList 1,784.92 us 1,627.88 us 1,780.60 us
ImmutableDictionary 3,971.08 us 3,931.74 us 4,539.17 us
ImmutableHashSet 4,081.95 us 4,040.37 us 4,349.96 us
ImmutableSortedDictionary 5,106.88 us 5,099.18 us 5,866.34 us
ImmutableSortedSet 5,140.73 us 4,812.98 us 5,433.37 us
ImmutableStack 38.02 us 34.87 us 38.38 us
ImmutableQueue 85.92 us 81.49 us 83.10 us
Array 17.16 us 16.86 us 18.64 us
Span 17.17 us 16.74 us 12.18 us
ICollection 44.58 us 46.65 us 61.00 us
IDictionary 224.83 us 226.64 us 382.38 us

So this optimization appears to get rid of most of the regression caused by using the jiterpreter for the add method. The rest of the problem may just be the overhead involved in entering traces, which suggests we might want to raise the minimum trace length across the board.

@kg kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 9, 2023
@kg
Copy link
Member Author

kg commented Feb 9, 2023

Sample optimized code (reformatted wabt disassembly) for the problem method in the CreateAddAndClear benchmark, alongside the opcode stream the jiterpreter is operating on:

void System.Collections.Generic.LinkedList`1<string>:InternalInsertNodeBefore (System.Collections.Generic.LinkedListNode`1<string>,System.Collections.Generic.LinkedListNode`1<string>)
stfld.o
ldfld.o
stfld.o
ldfld.o
stfld.o
stfld.o
ldfld.i4
add1.i4
stfld.i4
ldfld.i4
add1.i4
stfld.i4
int InternalInsertNodeBefore_0(int a, int *pLocals) {
  int *cknull_ptr;
  if (!i_stfld_o(pLocals, 12, 16, 8))
    return 6;

  cknull_ptr = pLocals[2];
  if (!cknull_ptr)
    return 14;

  pLocals[8] = cknull_ptr[16];
  if (!i_stfld_o(pLocals, 16, 16, 32))
    return 22;

  pLocals[8] = cknull_ptr[16];
  if (!i_stfld_o(pLocals, 12, 32, 16))
    return 38;

  i_stfld_o(pLocals, 16, 8, 16);
  cknull_ptr = pLocals[0];
  if (!cknull_ptr)
    return 54;

  pLocals[8] = cknull_ptr[20];
  pLocals[8] = pLocals[8] + 1;
  cknull_ptr[20] = pLocals[8];
  pLocals[8] = cknull_ptr[16];
  pLocals[8] = pLocals[8] + 1;
  cknull_ptr[16] = pLocals[8];

  return 98;
}

You can see a few of the direct field stores don't have a null check, and the return value of stfld_o isn't checked for one of the stores since we know it can't fail.

@kg kg marked this pull request as ready for review February 9, 2023 00:45
@kg kg force-pushed the wasm-jiterpreter-null-check-optimization branch from 3106778 to 3d79e28 Compare February 9, 2023 15:18
@kg kg requested a review from BrzVlad February 9, 2023 19:29
@kg kg merged commit 2b70123 into dotnet:main Feb 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Mar 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants