WIP: Rematerialize constants in LSRA. #12250

pgavlin · 2017-06-13T18:33:36Z

Rather than reloading spilled constants LSRA, rematerialize them at the
point of use. This avoids the need to insert spill and reload code.

This change also tweaks the allocator's heuristics slightly:

constant intervals are only allocated a register at the point of def
if there is an appropriate register available; otherwise, they are
spilled and rematerialized at the point of use
ref positions that are freely rematerializable are assigned the lowest
possible weight.

Rather than reloading spilled constants LSRA, rematerialize them at the point of use. This avoids the need to insert spill and reload code. This change also tweaks the allocator's heuristics slightly: - constant intervals are only allocated a register at the point of def if there is an appropriate register available; otherwise, they are spilled and rematerialized at the point of use - ref positions that are freely rematerializable are assigned the lowest possible weight.

pgavlin · 2017-06-13T18:40:31Z

@CarolEidt PTAL.

I do not expect that this is the final form that this will or should take: for example, the decision point for rematerialization needs to use better abstractions (currently it literally checks for spilled constant intervals). I think that this is a reasonable starting point, however.

From jit-diff x86/Release/Windows:

Summary:
(Note: Lower is better)

Total bytes of diff: -1657 (-0.02 % of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file regressions by size (bytes):
           4 : System.Runtime.Extensions.dasm (0.00 % of base)

Top file improvements by size (bytes):
        -524 : System.Private.CoreLib.dasm (-0.02 % of base)
        -241 : System.Runtime.Numerics.dasm (-0.42 % of base)
        -219 : System.Reflection.Metadata.dasm (-0.27 % of base)
        -187 : Microsoft.CodeAnalysis.CSharp.dasm (-0.01 % of base)
        -119 : Microsoft.CodeAnalysis.dasm (-0.02 % of base)

22 total files with size differences (21 improved, 1 regressed).

Top method regessions by size (bytes):
          19 : System.Reflection.Metadata.dasm - AssemblyTableReader:GetVersion():ref:this
          12 : System.Reflection.Metadata.dasm - AssemblyRefTableReader:GetVersion(int):ref:this
          12 : System.Runtime.Extensions.dasm - WebUtility:UrlEncodeToBytes(ref,int,int):ref
           5 : System.Reflection.Metadata.dasm - MemoryBlock:BinarySearchForSlot(int,int,int,int,bool):int:this
           4 : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:RescanTrailingColonAsToken(byref,byref):this

Top method improvements by size (bytes):
        -147 : System.Runtime.Numerics.dasm - BigIntegerCalculator:ExtractDigits(byref,byref,byref,byref)
         -84 : Microsoft.CodeAnalysis.dasm - Compilation:ConstructModuleSerializationProperties(ref,ref,struct):ref:this
         -72 : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
         -72 : System.Runtime.Numerics.dasm - BigIntegerCalculator:LehmerCore(byref,byref,long,long,long,long)
         -71 : System.Reflection.Metadata.dasm - EventDefinition:GetAccessors():struct:this

95 total methods with size differences (85 improved, 10 regressed).

Representative diffs:

 G_M34994_IG05:
        mov      dword ptr [ebp-78H], eax
        push     eax
        push     ebx
-       xor      edx, edx
-       mov      dword ptr [ebp-FCH], edx
        mov      ecx, dword ptr [ebp-28H]
        mov      dword ptr [ebp-94H], ecx
-       mov      edx, dword ptr [ebp-FCH]
+       xor      edx, edx
        mov      dword ptr [ebp-98H], edx
        push     edx
        push     dword ptr [ebp-94H]
        call     [CORINFO_HELP_LMUL]

        jae      G_M13864_IG10
        mov      edi, dword ptr [eax+4*ecx+4]
-       mov      dword ptr [ebp-A8H], edi
-       xor      edi, edi
-       mov      dword ptr [ebp-ACH], edi
-       mov      edi, dword ptr [ebp-A8H]
        mov      dword ptr [ebp-50H], edi
-       mov      edi, dword ptr [ebp-ACH]
+       xor      edi, edi
        mov      dword ptr [ebp-54H], edi
        lea      edi, [ecx-2]
        cmp      edi, esi
        jae      G_M13864_IG10

I also ran internal diffs on x64/Windows; these showed about a 30k improvement with 2 minor regressions.

mikedn · 2017-06-13T19:14:35Z

Is it me or this can move zeroes forward? That xor edi, edi in your example changes condition flags and the IR produced by decomposition and lowering may not expect that to happen at that point.

pgavlin · 2017-06-13T20:21:23Z

Is it me or this can move zeroes forward? That xor edi, edi in your example changes condition flags and the IR produced by decomposition and lowering may not expect that to happen at that point.

It can pull constants forward, yes. These are all constant nodes, though (e.g. GT_CNS_INT), which should be side-effect-free and therefore always safe to move. It would be news to me if the backend expects these nodes to have side effects besides writing the destination register.

mikedn · 2017-06-13T20:25:38Z

which should be side-effect-free and therefore always safe to move

Well, we do not model condition flags. If you have GT_SUB_LO followed by GT_SUB_HI and a GT_CNS_INT(0) lands between them we have a problem.

pgavlin · 2017-06-13T20:31:41Z

Well, we do not model condition flags. If you have GT_SUB_LO followed by GT_SUB_HI and a GT_CNS_INT(0) lands between them we have a problem.

Hmm, that is an interesting point--thanks for clarifying. I'll have to think about that a bit.

pgavlin · 2017-06-13T22:26:00Z

Yeah, it looks like this might be a non-starter without modeling the flags in LSRA: a relatively simple approach (i.e. using mov reg, 0 instead of xor reg, reg for rematerialized constant zeros) turns this into a regression. I'm trying something a bit more sophisticated now, but I'm not terribly hopeful.

CarolEidt · 2017-06-13T23:56:54Z

Yeah, it looks like this might be a non-starter without modeling the flags in LSRA

I don't think it would be all that difficult to model the flags in LSRA in a very limited way for this scenario, since it is in fact linear. One possible approach would be that in Lowering we could tag all the nodes that produce flag values that are subsequently consumed. Then, during RefPosition building, any such node that is encountered would invalidate the rematerializability of any constant whose consuming node hasn't yet been encountered.

pgavlin · 2017-06-14T00:22:12Z

One possible approach would be that in Lowering we could tag all the nodes that produce flag values that are subsequently consumed. Then, during RefPosition building, any such node that is encountered would invalidate the rematerializability of any constant whose consuming node hasn't yet been encountered.

This is roughly the approach I was considering. We would also probably need to tag the nodes that consume previously-produced flag values.

mikedn · 2017-06-14T05:34:06Z

One possible approach would be that in Lowering we could tag all the nodes that produce flag values that are subsequently consumed. Then, during RefPosition building, any such node that is encountered would invalidate the rematerializability of any constant whose consuming node hasn't yet been encountered.

What happens if the flags producer and consumer(s) are in different blocks?

CarolEidt · 2017-06-14T14:23:25Z

What happens if the flags producer and consumer(s) are in different blocks?

Good point. we would presumably have to also tag blocks where the flags are live-in or live-out.

pgavlin · 2017-06-14T15:20:23Z

Good point. we would presumably have to also tag blocks where the flags are live-in or live-out.

I happened to discuss flags modeling in the IR with @russellhadley a bit recently. We were kicking around the idea that we might treat the flags register as a special tracked lclVar (maybe not literally, if only to avoid kicking out other tracked locals). That would give us most of what we need here, though it may be a bit heavyweight for now. Tagging blocks and nodes as flags live in/out and def'd/used/consumed (respectively) seems like it should be good enough.

mikedn · 2017-06-14T16:36:39Z

We were kicking around the idea that we might treat the flags register as a special tracked lclVar (maybe not literally, if only to avoid kicking out other tracked locals).

I thought about that but it seems problematic if you want to reuse the flags set by a non-compare instruction (ADD, OR etc.). Even if you don't need to the actual value of the node (if you need it then you're dealing with a node that produces multiple values) you still need to distinguish between the actual value and the flags value.

I'm quite interested in this flags issue due to the various experiments I have in #10861

pgavlin · 2017-06-14T16:42:26Z

I thought about that but it seems problematic if you want to reuse the flags set by a non-compare instruction (ADD, OR etc.). Even if you don't need to the actual value of the node (if you need it then you're dealing with a node that produces multiple values) you still need to distinguish between the actual value and the flags value.

Can you clarify why this seems like it makes a lclVar-style solution problematic? We would simply end up with nodes that are modeled as producing an SDSU temp as well as writing to the flags "lclVar".

mikedn · 2017-06-14T16:46:00Z

as well as writing to the flags "lclVar"

Is writing to that special lclvar implicit rather than explicit?

pgavlin · 2017-06-14T16:47:24Z

Is writing to that special lclvar implicit rather than explicit?

If by implicit you mean "not represented by a STORE_LCL_VAR node", then yes, it would have to be.

pgavlin · 2017-06-14T16:48:21Z

(without some probably-invasive IR changes, that is)

mikedn · 2017-06-14T16:58:43Z

If by implicit you mean "not represented by a STORE_LCL_VAR node"

Yes, thanks. In that case it's not clear what a lclvar would buy us.

CarolEidt · 2017-06-14T17:00:41Z

We were kicking around the idea that we might treat the flags register as a special tracked lclVar

It's unclear to me what a lclVar buys you. And it adds complexity to handle a lclVar that has neither a stack location nor a register (unless we go down the even more complex path of saving and restoring flag values.

pgavlin · 2017-06-14T17:30:00Z

Yes, thanks. In that case it's not clear what a lclvar would buy us.

It's unclear to me what a lclVar buys you. And it adds complexity to handle a lclVar that has neither a stack location nor a register (unless we go down the even more complex path of saving and restoring flag values.

The flags register has lclVar-like semantics in that it is a multi-def, multi-use location that is read/written by certain operators. Representing it as an actual lclVar might obviate the need to add special-case code in the various places in which we will otherwise need to (especially liveness and side-effect analysis). This might just move these special cases to lclVars in general, but that could be beneficial if we were to decide e.g. to represent physical registers as lclVars as well.

AndyAyersMS · 2017-06-14T17:57:32Z

Isn't there already some mechanism to track physical register liveness? EG instructions that write specific or multiple registers, ABI kill sets, etc?

Seems like this could possibly be extended to cover flag liveness instead of some virtualized resource.

Also note there are some cases where flags are partially written so it is perhaps more than read/write that needs modelling.

pgavlin · 2017-06-14T18:33:04Z

Also note there are some cases where flags are partially written so it is perhaps more than read/write that needs modelling.

Yes, we could get more granular w.r.t. which flags are written. As a first pass I hope that we don't need to, but it's certainly worth considering.

russellhadley · 2017-06-14T19:57:43Z

I've been following along and wanted to layout what I think are the requirements/consequences of using execution order in LIR - which these first simple recalculation opts are hitting.

We will need a general mechanism for tracking flags/physical resource liveness so that reordering of operations can be safely performed. This could be as simple as a ondemand conservative liveness calculation utilites (anticipated def/upward exposed use), or a first order construct that participates in liveness.
While it would be great to avoid it, some of these lifetimes will cross block boundaries. Flags in particular for decomp and branch optimizations. We will need some what to capture this in the interblock liveness.

We're going to keep hitting this, and likely more often, now that we're going to start pushing for more perf in 2.1. New issue opened to track this #12275

karelz · 2017-08-25T17:53:40Z

What is status of this PR? 2+ months without any update ...

pgavlin · 2017-08-25T18:19:15Z

In progress. Some prerequisites landed recently.

karelz · 2017-08-25T18:31:42Z

Would it make sense to close it until it is ready for further work?

karelz · 2017-08-30T17:55:39Z

Closing for now, feel free to reopen when you're ready to work on it again ...

dnfclas added the cla-already-signed label Jun 13, 2017

jkotas added the area-CodeGen label Jul 5, 2017

karelz assigned pgavlin Aug 25, 2017

karelz closed this Aug 30, 2017

russellhadley mentioned this pull request Jan 31, 2020

Model physical resource liveness in LIR dotnet/runtime#8353

Open

CarolEidt mentioned this pull request Jan 31, 2020

[RyuJIT] Use a backward traversal for Decomp and Lowering dotnet/runtime#8367

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Rematerialize constants in LSRA. #12250

WIP: Rematerialize constants in LSRA. #12250

pgavlin commented Jun 13, 2017

pgavlin commented Jun 13, 2017

mikedn commented Jun 13, 2017

pgavlin commented Jun 13, 2017

mikedn commented Jun 13, 2017

pgavlin commented Jun 13, 2017

pgavlin commented Jun 13, 2017

CarolEidt commented Jun 13, 2017

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

CarolEidt commented Jun 14, 2017

pgavlin commented Jun 14, 2017 •

edited

Loading

mikedn commented Jun 14, 2017 •

edited

Loading

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

pgavlin commented Jun 14, 2017

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

CarolEidt commented Jun 14, 2017

pgavlin commented Jun 14, 2017

AndyAyersMS commented Jun 14, 2017

pgavlin commented Jun 14, 2017

russellhadley commented Jun 14, 2017

karelz commented Aug 25, 2017

pgavlin commented Aug 25, 2017

karelz commented Aug 25, 2017

karelz commented Aug 30, 2017

WIP: Rematerialize constants in LSRA. #12250

WIP: Rematerialize constants in LSRA. #12250

Conversation

pgavlin commented Jun 13, 2017

pgavlin commented Jun 13, 2017

mikedn commented Jun 13, 2017

pgavlin commented Jun 13, 2017

mikedn commented Jun 13, 2017

pgavlin commented Jun 13, 2017

pgavlin commented Jun 13, 2017

CarolEidt commented Jun 13, 2017

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

CarolEidt commented Jun 14, 2017

pgavlin commented Jun 14, 2017 • edited Loading

mikedn commented Jun 14, 2017 • edited Loading

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

pgavlin commented Jun 14, 2017

pgavlin commented Jun 14, 2017

mikedn commented Jun 14, 2017

CarolEidt commented Jun 14, 2017

pgavlin commented Jun 14, 2017

AndyAyersMS commented Jun 14, 2017

pgavlin commented Jun 14, 2017

russellhadley commented Jun 14, 2017

karelz commented Aug 25, 2017

pgavlin commented Aug 25, 2017

karelz commented Aug 25, 2017

karelz commented Aug 30, 2017

pgavlin commented Jun 14, 2017 •

edited

Loading

mikedn commented Jun 14, 2017 •

edited

Loading