Model physical resource liveness in LIR #8353

russellhadley · 2017-06-14T19:55:55Z

To allow for more general optimizations of the instruction stream in LIR (like dotnet/coreclr#12250 and #8035) we will need be able to deal with the liveness of flags, as well as physical registers in the IR. Not having this blocks reordering optimizations and makes changes in transformations more error prone. ("oh darn I forgot about the flags case" bugs)

category:design
theme:ir
skill-level:expert
cost:large
impact:medium

pgavlin · 2017-06-20T15:34:05Z

Regarding flags modelling in particular, here's a sketch of what I've been thinking:

Add GenTree::WritesFlags and GenTree::ReadsFlags APIs with the obvious semantics and change platform-specific lowering to properly set whatever bit(s) back these APIs
Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values
Update side-effect analysis code to take flag reads/writes into account when determining interference
In LSRA, add a non-allocatable physreg record to track reads and writes of the flags register

@CarolEidt @BruceForstall @mikedn thoughts? This should all be pretty simple and should serve our purposes well enough.

CarolEidt · 2017-06-20T15:52:36Z

That makes sense. Presumably the liveness and side-effect analyses would only be done when in LIR.

mikedn · 2017-06-20T16:03:44Z

Sounds reasonable for a start and maybe for more. Some notes:

Add GenTree::WritesFlags and GenTree::ReadsFlags APIs with the obvious semantics and change platform-specific lowering to properly set whatever bit(s) back these APIs

We already have something like that but it's a bit messy - GTF_SET_FLAGS and gtSetFlags(). Lowering (actually TreeNodeInfoInit right now, should be moved to lowering) uses that to prevent codegen from doing some bad transforms (e.g. ADD -> LEA).

Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values

I'm not sure that's needed right now. The only cross block use of flags we had disappeared with my recent long relop lowering simplification. That said, some future optimizations might need this - eliminating a redundant compare from code like:

if (x < y) {
}
else if (x > y) {
}

pgavlin · 2017-06-20T16:07:27Z

Presumably the liveness and side-effect analyses would only be done when in LIR.

Correct.

We already have something like that but it's a bit messy - GTF_SET_FLAGS and gtSetFlags().

I think the semantics of this bit are a bit different than they seem. IIUC this bit is used to indicate that the ARM backend should generate an instruction that sets the CCR; it does not appear to have any meaning on other platforms. W.R.T. an implementation of the proposed design, I would suggest that we replace the APIs you've mentioned with GenTree::WritesFlags (possibly only when targeting the RyuJIT backend).

Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values

I'm not sure that's needed right now. The only cross block use of flags we had disappeared with my > recent long relop lowering simplification. That said, some future optimizations might need this

I don't think that it hurts to do the work now (unless doing so has an undue adverse impact on JIT throughput).

russellhadley · 2017-06-20T16:17:36Z

Does this generalize to other physical resources? GP registers, sub flags, system registers a la arm64? This seems like a place where we'd want a plan that covers the broader category of hardware resources in LIR even if we only need to stage a piece of it today.

mikedn · 2017-06-20T16:21:31Z

it does not appear to have any meaning on other platforms.

It does, see the ADD->LEA case I mentioned. The whole thing is rather convoluted, it originated in the legacy JIT, got reused in RyuJIT at some point, it's set in TreeNodeInfoInit instead of lowering and so on. And more importantly, the flag is set only on certain nodes (ADD, SUB etc.) and not on others (relop, CMP). It probably has to be set consistently if LSRA is to use it.

I would suggest that we replace the APIs you've mentioned with GenTree::WritesFlags (possibly only when targeting the RyuJIT backend).

Yes, that would be good.

pgavlin · 2017-06-20T16:24:53Z

It does, see the ADD->LEA case I mentioned. The whole thing is rather convoluted, it originated in the legacy JIT, got reused in RyuJIT at some point, it's set in TreeNodeInfoInit instead of lowering and so on.

Yuck, I see that use now. My quick skim of git grep folded that use in codegenxarch with those in codegenlegacy :(

It probably has to be set consistently if LSRA is to use it.

Yes, definitely. We want any code that reasons about side-effects to be able to depend on this, so it needs to be set consistently and accurately. Until it is, we may need to be extremely pessimistic (e.g. assume all nodes will generate code that sets flags unless they have explicitly indicated otherwise).

pgavlin · 2017-06-20T16:32:02Z

Does this generalize to other physical resources? GP registers, sub flags, system registers a la arm64? This seems like a place where we'd want a plan that covers the broader category of hardware resources in LIR even if we only need to stage a piece of it today.

It does not. If we all agree on the aforementioned design for an interim solution for flags, I will open a separate issue to track that and we can discuss a more complete design here. I do not want to block making incremental progress (i.e. flags modeling) on a design for a feature for which we have no short-term use case.

russellhadley · 2017-06-20T16:43:21Z

I'd like to make progress too but I'm concerned that interim solutions evolve into defacto designs hamstring medium term work - can we at least surface what the divergent concerns are that are in the way of us proposing a more consistent design?

CarolEidt · 2017-06-20T16:45:02Z

I'm concerned that interim solutions evolve into defacto designs hamstring medium term work

I think it is a greater danger to design for scenarios that we are not currently addressing. Chances are that the design will not be quite what we want down the road. I am a strong proponent of designing for scenarios at hand.

pgavlin · 2017-06-20T16:51:07Z

can we at least surface what the divergent concerns are that are in the way of us proposing a more consistent design?

Here are some of the issues that I think a more complete solution would need to address:

How do we name physical resources in the IR?
How do we deal with nodes that write to/read from multiple physical resources?
How do we reconcile physical and logical resources (e.g. do we replace references to lclVars with references to the appropriate registers/stack slots after register allocation)?

At first glance, it seems that unifying physical resources with lclVars is a reasonable approach in a broad sense: both are multi-def, multi-use resources that want to participate in liveness, and both must be considered by the RA. The practicalities of implementing such a design may be prohibitive, however, given the various assumptions made throughout the code about lclVars and how they relate to physical resources.

russellhadley · 2017-06-20T17:01:28Z

I believe the goal of optimizing LIR is near enough at hand that having a design that covers the more general is both practical and profitable. In concrete terms this just allows us to better answer two related questions we already ask today:

What nodes maybe reordered while preserving program side effects.
What resources are live at what point in the flow graph.

My view is that these are fundamental and immediate enough that I don't think this is premature design.

AndyAyersMS · 2017-06-20T17:17:40Z

@mikedn There are unordered FP compares that do back to back flags tests... eg

//    BNE.UN(a,b)      ucomis[s|d] a, b        From the above table, PF=1 or ZF=0 implies unordered or a!=b
//                     jpe <true label>
//                     jne <true label>

So technically the flags are live cross block here.

But perhaps this does this not really count as cross-block live to the jit, if it allows 2 jump instructions in a single BB if they have the same target?

pgavlin · 2017-06-20T17:25:34Z

But perhaps this does this not really count as cross-block live to the jit,

Yeah, I think this would be represented in the IR as a single GT_JTRUE node fed by a compare, in which case the flags are not live cross-block in a sense that the JIT models.

mikedn · 2017-06-20T17:46:58Z

But perhaps this does this not really count as cross-block live to the jit

Yes, these aren't lowered, codegen handles them as already mentioned by @pgavlin. I preserved this approach even in my relop lowering experiment, it doesn't seem to be worth changing.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

jkotas added tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed reliability labels Feb 4, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model physical resource liveness in LIR #8353

Model physical resource liveness in LIR #8353

russellhadley commented Jun 14, 2017 •

edited by BruceForstall

Loading

pgavlin commented Jun 20, 2017

CarolEidt commented Jun 20, 2017

mikedn commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017

mikedn commented Jun 20, 2017

pgavlin commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017

CarolEidt commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017 •

edited

Loading

AndyAyersMS commented Jun 20, 2017

pgavlin commented Jun 20, 2017

mikedn commented Jun 20, 2017

Model physical resource liveness in LIR #8353

Model physical resource liveness in LIR #8353

Comments

russellhadley commented Jun 14, 2017 • edited by BruceForstall Loading

pgavlin commented Jun 20, 2017

CarolEidt commented Jun 20, 2017

mikedn commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017

mikedn commented Jun 20, 2017

pgavlin commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017

CarolEidt commented Jun 20, 2017

pgavlin commented Jun 20, 2017

russellhadley commented Jun 20, 2017 • edited Loading

AndyAyersMS commented Jun 20, 2017

pgavlin commented Jun 20, 2017

mikedn commented Jun 20, 2017

russellhadley commented Jun 14, 2017 •

edited by BruceForstall

Loading

russellhadley commented Jun 20, 2017 •

edited

Loading