-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Optimize out redundant null checks in the jiterpreter #81811
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsThis is an experimental attempt to optimize out redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller. Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch. Creating a draft PR to see what fails on CI.
|
1eb67de
to
3dbed8d
Compare
Current measurements for the LinkedList regression and its neighbors:
So this optimization appears to get rid of most of the regression caused by using the jiterpreter for the add method. The rest of the problem may just be the overhead involved in entering traces, which suggests we might want to raise the minimum trace length across the board. |
Sample optimized code (reformatted wabt disassembly) for the problem method in the CreateAddAndClear benchmark, alongside the opcode stream the jiterpreter is operating on:
int InternalInsertNodeBefore_0(int a, int *pLocals) {
int *cknull_ptr;
if (!i_stfld_o(pLocals, 12, 16, 8))
return 6;
cknull_ptr = pLocals[2];
if (!cknull_ptr)
return 14;
pLocals[8] = cknull_ptr[16];
if (!i_stfld_o(pLocals, 16, 16, 32))
return 22;
pLocals[8] = cknull_ptr[16];
if (!i_stfld_o(pLocals, 12, 32, 16))
return 38;
i_stfld_o(pLocals, 16, 8, 16);
cknull_ptr = pLocals[0];
if (!cknull_ptr)
return 54;
pLocals[8] = cknull_ptr[20];
pLocals[8] = pLocals[8] + 1;
cknull_ptr[20] = pLocals[8];
pLocals[8] = cknull_ptr[16];
pLocals[8] = pLocals[8] + 1;
cknull_ptr[16] = pLocals[8];
return 98;
} You can see a few of the direct field stores don't have a null check, and the return value of stfld_o isn't checked for one of the stores since we know it can't fail. |
…ll check was correct to eliminate
Fix bug where sometimes last opcode in a trace would execute twice
3106778
to
3d79e28
Compare
This PR optimizes out some redundant null checks in the jiterpreter, in scenarios where we're certain that a local hasn't changed since we last checked whether it was null. For code that does lots of field accesses on a single object this should significantly improve performance and also make traces smaller.
Because we don't get any additional information from the interpreter like how big a local is, invalidation is basically a big question mark for ldloca. We also don't have liveness ranges, so we have to throw out all our knowledge when we branch.
A configuration flag in the source is added that can be used to generate runtime assertions when a null check is eliminated - you can use this if trying to diagnose a crash that appears related to this optimization. You can also disable the optimization entirely with a runtime flag or by changing options-def.h.
This PR also fixes a bug where in some cases after we exited a trace, the interpreter would run the last opcode of the trace.
The changes in this PR appear to recover most of the performance lost in parts of the CreateAddAndClear benchmark suite. In the long run most or all of this optimization will move up into the interpreter so that null check elimination can happen during tiered code generation.