-
Notifications
You must be signed in to change notification settings - Fork 2.7k
WIP: Rematerialize constants in LSRA. #12250
Conversation
Rather than reloading spilled constants LSRA, rematerialize them at the point of use. This avoids the need to insert spill and reload code. This change also tweaks the allocator's heuristics slightly: - constant intervals are only allocated a register at the point of def if there is an appropriate register available; otherwise, they are spilled and rematerialized at the point of use - ref positions that are freely rematerializable are assigned the lowest possible weight.
@CarolEidt PTAL. I do not expect that this is the final form that this will or should take: for example, the decision point for rematerialization needs to use better abstractions (currently it literally checks for spilled constant intervals). I think that this is a reasonable starting point, however. From jit-diff x86/Release/Windows:
Representative diffs: G_M34994_IG05:
mov dword ptr [ebp-78H], eax
push eax
push ebx
- xor edx, edx
- mov dword ptr [ebp-FCH], edx
mov ecx, dword ptr [ebp-28H]
mov dword ptr [ebp-94H], ecx
- mov edx, dword ptr [ebp-FCH]
+ xor edx, edx
mov dword ptr [ebp-98H], edx
push edx
push dword ptr [ebp-94H]
call [CORINFO_HELP_LMUL] jae G_M13864_IG10
mov edi, dword ptr [eax+4*ecx+4]
- mov dword ptr [ebp-A8H], edi
- xor edi, edi
- mov dword ptr [ebp-ACH], edi
- mov edi, dword ptr [ebp-A8H]
mov dword ptr [ebp-50H], edi
- mov edi, dword ptr [ebp-ACH]
+ xor edi, edi
mov dword ptr [ebp-54H], edi
lea edi, [ecx-2]
cmp edi, esi
jae G_M13864_IG10 I also ran internal diffs on x64/Windows; these showed about a 30k improvement with 2 minor regressions. |
Is it me or this can move zeroes forward? That |
It can pull constants forward, yes. These are all constant nodes, though (e.g. |
Well, we do not model condition flags. If you have |
Hmm, that is an interesting point--thanks for clarifying. I'll have to think about that a bit. |
Yeah, it looks like this might be a non-starter without modeling the flags in LSRA: a relatively simple approach (i.e. using |
I don't think it would be all that difficult to model the flags in LSRA in a very limited way for this scenario, since it is in fact linear. One possible approach would be that in |
This is roughly the approach I was considering. We would also probably need to tag the nodes that consume previously-produced flag values. |
What happens if the flags producer and consumer(s) are in different blocks? |
Good point. we would presumably have to also tag blocks where the flags are live-in or live-out. |
I happened to discuss flags modeling in the IR with @russellhadley a bit recently. We were kicking around the idea that we might treat the flags register as a special tracked lclVar (maybe not literally, if only to avoid kicking out other tracked locals). That would give us most of what we need here, though it may be a bit heavyweight for now. Tagging blocks and nodes as flags live in/out and def'd/used/consumed (respectively) seems like it should be good enough. |
I thought about that but it seems problematic if you want to reuse the flags set by a non-compare instruction (ADD, OR etc.). Even if you don't need to the actual value of the node (if you need it then you're dealing with a node that produces multiple values) you still need to distinguish between the actual value and the flags value. I'm quite interested in this flags issue due to the various experiments I have in #10861 |
Can you clarify why this seems like it makes a lclVar-style solution problematic? We would simply end up with nodes that are modeled as producing an SDSU temp as well as writing to the flags "lclVar". |
Is writing to that special lclvar implicit rather than explicit? |
If by implicit you mean "not represented by a |
(without some probably-invasive IR changes, that is) |
Yes, thanks. In that case it's not clear what a lclvar would buy us. |
It's unclear to me what a lclVar buys you. And it adds complexity to handle a lclVar that has neither a stack location nor a register (unless we go down the even more complex path of saving and restoring flag values. |
The flags register has lclVar-like semantics in that it is a multi-def, multi-use location that is read/written by certain operators. Representing it as an actual lclVar might obviate the need to add special-case code in the various places in which we will otherwise need to (especially liveness and side-effect analysis). This might just move these special cases to lclVars in general, but that could be beneficial if we were to decide e.g. to represent physical registers as lclVars as well. |
Isn't there already some mechanism to track physical register liveness? EG instructions that write specific or multiple registers, ABI kill sets, etc? Seems like this could possibly be extended to cover flag liveness instead of some virtualized resource. Also note there are some cases where flags are partially written so it is perhaps more than read/write that needs modelling. |
Yes, we could get more granular w.r.t. which flags are written. As a first pass I hope that we don't need to, but it's certainly worth considering. |
I've been following along and wanted to layout what I think are the requirements/consequences of using execution order in LIR - which these first simple recalculation opts are hitting.
We're going to keep hitting this, and likely more often, now that we're going to start pushing for more perf in 2.1. New issue opened to track this #12275 |
What is status of this PR? 2+ months without any update ... |
In progress. Some prerequisites landed recently. |
Would it make sense to close it until it is ready for further work? |
Closing for now, feel free to reopen when you're ready to work on it again ... |
Rather than reloading spilled constants LSRA, rematerialize them at the
point of use. This avoids the need to insert spill and reload code.
This change also tweaks the allocator's heuristics slightly:
if there is an appropriate register available; otherwise, they are
spilled and rematerialized at the point of use
possible weight.