-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Re-enable null check optimization for mid-method traces #84058
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsRight now traces in the middle of methods don't support null check optimizations since we don't know if ldloca happened previously in the method. This PR adds a quick scan of the method leading up to the trace for ldlocas so we have the information we need for (partial) null check optimization.
|
cc @BrzVlad if you have time to look, does this make sense to you? |
Interp promotes address taken vars to be global in This is also fine if you prefer to do it later. |
That is perfect, thanks for pointing out where that happens. I've previously experimented with flowing variable data through with mixed success but I think the next time I try I'll be able to get it in good shape. It's already on the todo list in the tracking issue, I just haven't had time to redo this stuff from the ground up and flow the information through. A 'this method has ldloca' flag sounds like a really simple starting point that would be valuable. |
…spaces in the stack have had their address taken Use the interpreter bitset for jiterpreter null check optimization Enable null check optimization for all traces
d566fb6
to
808ac5b
Compare
Reworked inspired by @BrzVlad 's feedback: We now build a bitset when inserting trace(s) into a method, where each bit indicates whether the 8 bytes in that slot ever have their address taken during the method. In my testing this matches the old analysis 99%+ of the time, so it shouldn't cause any performance regressions, but it's also more correct since it will handle the case vlad pointed out (ldlocas after the current instruction we haven't seen yet, in methods with loops). This also handles the hypothetical case of us null check optimizing a field of a stack-allocated struct and then someone ldloca'ing the whole struct, because we can ensure all the slots occupied by the struct have the address-taken flag set now. (The jiterpreter doesn't currently know the type or size of locals) The memory cost of the bitset feels potentially bad but I'm not sure how I could do much better here. Maybe a packed list of offsets and sizes instead? |
@@ -942,6 +942,30 @@ trace_info_alloc () { | |||
return index; | |||
} | |||
|
|||
static void | |||
build_address_taken_bitset (TransformData *td, InterpBasicBlock *bb, guint32 bitset_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can further optimize this a bit. For example, you can add a new field to TransformData
, something like max_indirect_offset
that you set in
runtime/src/mono/mono/mini/interp/transform.c
Line 10316 in d0913fc
td->locals [var].flags |= INTERP_LOCAL_FLAG_GLOBAL; |
td->total_locals_size
. You can then run this pass only if this field is non zero and you can also have less memory use for the bitset array (note you would need to do a range check in jiterp for overflowing offsets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great, but I'd prefer to do it in a follow-up PR. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, also if you consider this to be worth it
This PR moves the ldloca analysis from the jiterpreter trace generator to the pass that inserts entry points, producing a bitset with a 1 for each interpreter stack slot that has its address taken. As a result we're able to remove the 'trace must start at the beginning of the method' restriction for null check optimization, and we pick up some small correctness improvements along with it.