Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model physical resource liveness in LIR #8353

Open
russellhadley opened this issue Jun 14, 2017 · 15 comments
Open

Model physical resource liveness in LIR #8353

russellhadley opened this issue Jun 14, 2017 · 15 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@russellhadley
Copy link
Contributor

russellhadley commented Jun 14, 2017

To allow for more general optimizations of the instruction stream in LIR (like dotnet/coreclr#12250 and #8035) we will need be able to deal with the liveness of flags, as well as physical registers in the IR. Not having this blocks reordering optimizations and makes changes in transformations more error prone. ("oh darn I forgot about the flags case" bugs)

category:design
theme:ir
skill-level:expert
cost:large
impact:medium

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

Regarding flags modelling in particular, here's a sketch of what I've been thinking:

  • Add GenTree::WritesFlags and GenTree::ReadsFlags APIs with the obvious semantics and change platform-specific lowering to properly set whatever bit(s) back these APIs
  • Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values
  • Update side-effect analysis code to take flag reads/writes into account when determining interference
  • In LSRA, add a non-allocatable physreg record to track reads and writes of the flags register

@CarolEidt @BruceForstall @mikedn thoughts? This should all be pretty simple and should serve our purposes well enough.

@CarolEidt
Copy link
Contributor

That makes sense. Presumably the liveness and side-effect analyses would only be done when in LIR.

@mikedn
Copy link
Contributor

mikedn commented Jun 20, 2017

Sounds reasonable for a start and maybe for more. Some notes:

Add GenTree::WritesFlags and GenTree::ReadsFlags APIs with the obvious semantics and change platform-specific lowering to properly set whatever bit(s) back these APIs

We already have something like that but it's a bit messy - GTF_SET_FLAGS and gtSetFlags(). Lowering (actually TreeNodeInfoInit right now, should be moved to lowering) uses that to prevent codegen from doing some bad transforms (e.g. ADD -> LEA).

Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values

I'm not sure that's needed right now. The only cross block use of flags we had disappeared with my recent long relop lowering simplification. That said, some future optimizations might need this - eliminating a redundant compare from code like:

if (x < y) {
}
else if (x > y) {
}

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

Presumably the liveness and side-effect analyses would only be done when in LIR.

Correct.

We already have something like that but it's a bit messy - GTF_SET_FLAGS and gtSetFlags().

I think the semantics of this bit are a bit different than they seem. IIUC this bit is used to indicate that the ARM backend should generate an instruction that sets the CCR; it does not appear to have any meaning on other platforms. W.R.T. an implementation of the proposed design, I would suggest that we replace the APIs you've mentioned with GenTree::WritesFlags (possibly only when targeting the RyuJIT backend).

Add BasicBlock::FlagsLiveIn and BasicBlock::FlagsLiveOut APIs with the obvious semantics and change liveness to compute these values

I'm not sure that's needed right now. The only cross block use of flags we had disappeared with my > recent long relop lowering simplification. That said, some future optimizations might need this

I don't think that it hurts to do the work now (unless doing so has an undue adverse impact on JIT throughput).

@russellhadley
Copy link
Contributor Author

Does this generalize to other physical resources? GP registers, sub flags, system registers a la arm64? This seems like a place where we'd want a plan that covers the broader category of hardware resources in LIR even if we only need to stage a piece of it today.

@mikedn
Copy link
Contributor

mikedn commented Jun 20, 2017

it does not appear to have any meaning on other platforms.

It does, see the ADD->LEA case I mentioned. The whole thing is rather convoluted, it originated in the legacy JIT, got reused in RyuJIT at some point, it's set in TreeNodeInfoInit instead of lowering and so on. And more importantly, the flag is set only on certain nodes (ADD, SUB etc.) and not on others (relop, CMP). It probably has to be set consistently if LSRA is to use it.

I would suggest that we replace the APIs you've mentioned with GenTree::WritesFlags (possibly only when targeting the RyuJIT backend).

Yes, that would be good.

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

It does, see the ADD->LEA case I mentioned. The whole thing is rather convoluted, it originated in the legacy JIT, got reused in RyuJIT at some point, it's set in TreeNodeInfoInit instead of lowering and so on.

Yuck, I see that use now. My quick skim of git grep folded that use in codegenxarch with those in codegenlegacy :(

It probably has to be set consistently if LSRA is to use it.

Yes, definitely. We want any code that reasons about side-effects to be able to depend on this, so it needs to be set consistently and accurately. Until it is, we may need to be extremely pessimistic (e.g. assume all nodes will generate code that sets flags unless they have explicitly indicated otherwise).

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

Does this generalize to other physical resources? GP registers, sub flags, system registers a la arm64? This seems like a place where we'd want a plan that covers the broader category of hardware resources in LIR even if we only need to stage a piece of it today.

It does not. If we all agree on the aforementioned design for an interim solution for flags, I will open a separate issue to track that and we can discuss a more complete design here. I do not want to block making incremental progress (i.e. flags modeling) on a design for a feature for which we have no short-term use case.

@russellhadley
Copy link
Contributor Author

I'd like to make progress too but I'm concerned that interim solutions evolve into defacto designs hamstring medium term work - can we at least surface what the divergent concerns are that are in the way of us proposing a more consistent design?

@CarolEidt
Copy link
Contributor

I'm concerned that interim solutions evolve into defacto designs hamstring medium term work

I think it is a greater danger to design for scenarios that we are not currently addressing. Chances are that the design will not be quite what we want down the road. I am a strong proponent of designing for scenarios at hand.

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

can we at least surface what the divergent concerns are that are in the way of us proposing a more consistent design?

Here are some of the issues that I think a more complete solution would need to address:

  • How do we name physical resources in the IR?
  • How do we deal with nodes that write to/read from multiple physical resources?
  • How do we reconcile physical and logical resources (e.g. do we replace references to lclVars with references to the appropriate registers/stack slots after register allocation)?

At first glance, it seems that unifying physical resources with lclVars is a reasonable approach in a broad sense: both are multi-def, multi-use resources that want to participate in liveness, and both must be considered by the RA. The practicalities of implementing such a design may be prohibitive, however, given the various assumptions made throughout the code about lclVars and how they relate to physical resources.

@russellhadley
Copy link
Contributor Author

russellhadley commented Jun 20, 2017

I believe the goal of optimizing LIR is near enough at hand that having a design that covers the more general is both practical and profitable. In concrete terms this just allows us to better answer two related questions we already ask today:

  • What nodes maybe reordered while preserving program side effects.
  • What resources are live at what point in the flow graph.

My view is that these are fundamental and immediate enough that I don't think this is premature design.

@AndyAyersMS
Copy link
Member

@mikedn There are unordered FP compares that do back to back flags tests... eg

//    BNE.UN(a,b)      ucomis[s|d] a, b        From the above table, PF=1 or ZF=0 implies unordered or a!=b
//                     jpe <true label>
//                     jne <true label>

So technically the flags are live cross block here.

But perhaps this does this not really count as cross-block live to the jit, if it allows 2 jump instructions in a single BB if they have the same target?

@pgavlin
Copy link
Contributor

pgavlin commented Jun 20, 2017

But perhaps this does this not really count as cross-block live to the jit,

Yeah, I think this would be represented in the IR as a single GT_JTRUE node fed by a compare, in which case the flags are not live cross-block in a sense that the JIT models.

@mikedn
Copy link
Contributor

mikedn commented Jun 20, 2017

But perhaps this does this not really count as cross-block live to the jit

Yes, these aren't lowered, codegen handles them as already mentioned by @pgavlin. I preserved this approach even in my relop lowering experiment, it doesn't seem to be worth changing.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@jkotas jkotas added tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed reliability labels Feb 4, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

No branches or pull requests

8 participants