-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Use linear block order for MinOpts in LSRA #108147
JIT: Use linear block order for MinOpts in LSRA #108147
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
I looked at the JitDisasm summary, and there are only a handful of |
Per discussion here, I'm going to try using the optimized layout all the time to see if we can get around some of the pitfalls of using a RPO sequence on its own. |
Sorry for getting back to you so late on this, but I finally got pin working thanks to Jakob. The bulk of the TP regression for the above method is coming from |
I took another look at the TP regression noted above, and it had to do with - ldr q16, [fp, #0xB8] // [V12 tmp2]
- str q16, [fp, #0xD1FFAB1E] // [V03 loc1]
- ldr q16, [fp, #0xC8] // [V12 tmp2+0x10]
- str q16, [fp, #0xD1FFAB1E] // [V04 loc2]
- ldr q16, [fp, #0xD1FFAB1E] // [V03 loc1]
- ldr q17, [@RWD16]
- tbl v16.16b, {v16.16b}, v17.16b
- str q16, [fp, #0xD1FFAB1E] // [V03 loc1]
+ ldr q0, [fp, #0xB8] // [V12 tmp2]
+ str q0, [fp, #0xD1FFAB1E] // [V03 loc1]
+ ldr q0, [fp, #0xC8] // [V12 tmp2+0x10]
+ str q0, [fp, #0xD1FFAB1E] // [V04 loc2]
+ ldr q0, [fp, #0xD1FFAB1E] // [V03 loc1]
+ ldr q1, [@RWD16]
+ tbl v0.16b, {v0.16b}, v1.16b
+ str q0, [fp, #0xD1FFAB1E] // [V03 loc1] I also noticed that we were calling @kunalspathak could you PTAL? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM and it was good catch regarding findPredBlockForLiveIn()
. I was hoping to see asmdiff for MinOpts scenario with this change but how is it a zero diff?
If we don't have registers live across blocks, then LSRA will never need to use a block's "ideal" predecessor to identify incoming live registers, so skipping |
BasicBlock* prevBlock = nullptr; | ||
|
||
// Initialize currentLiveVars to the empty set. We will set it to the current | ||
// live-in at the entry to each block (this will include the incoming args on | ||
// the first block). | ||
VarSetOps::AssignNoCopy(compiler, currentLiveVars, VarSetOps::MakeEmpty(compiler)); | ||
const bool floatingPointUsed = compiler->compFloatingPointUsed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it compFloatingPointUsed
is meant to be set before we get to LSRA. What is the case where it is not set and then gets set later? I think it would be better to have some blockUsedFloatingPoint
variable that is deterministically set for each block and that LSRA can use for decisions that involve only the particular block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes and we should audit the cases where that is not happening. I remember running into this a while back and had to rely on it only after buildIntervals()
where it was guaranteed to set correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LinearScan::BuildDef
, we have a check that sets compFloatingPointUsed
to true if the tree we're building the definition for can't use integer registers. I believe you're understanding of compFloatingPointUsed
is correct -- LSRA doesn't seem to be using it all that judiciously, hence this PR. I'll try your suggestion in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LinearScan::BuildDef, we have a check that sets compFloatingPointUsed to true if the tree we're building the definition for can't use integer registers.
We duplicate this check in LinearScan::identifyCandidates
, so we have a few spots in LSRA to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that it is going to come with TP regressions when someone decides to set compFloatingPointUsed
correctly in the future. That's why I would like it to be explicit if we are changing LSRA to make "block-local" decisions in some cases where that is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only place is when we create kills for calls. So likely we could just have some bool needsToKillFloatRegs
that LSRA uses for this purpose, and that is block-local in MinOpts and global (using compFloatingPointUsed
) in FullOpts. It would probably come with even better MinOpts TP improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that it is going to come with TP regressions when someone decides to set compFloatingPointUsed correctly in the future. That's why I would like it to be explicit if we are changing LSRA to make "block-local" decisions in some cases where that is ok.
That's valid, I'll open a follow-up PR for this.
I meant the change in |
If registers aren't live across blocks, then the order of blocks allocated shouldn't affect codegen, right? |
* main: (27 commits) Fold null checks against known non-null values (dotnet#109164) JIT: Always track the context for late devirt (dotnet#112396) JIT: array allocation fixes (dotnet#112676) [H/3] Fix test closing connection too fast (dotnet#112691) Fix LINQ handling of iterator.Take(...).Last(...) (dotnet#112680) [browser][MT] move wasm MT CI legs to extra-platforms (dotnet#112690) JIT: Don't use `Compiler::compFloatingPointUsed` to check if FP kills are needed (dotnet#112668) [LoongArch64] Fix a typo within PR#112166. (dotnet#112672) Fix new EH hang on DebugBreak (dotnet#112640) Use encode callback instead of renting a buffer to write to in DSAKeyFormatHelper Move some links to other doc (dotnet#112574) Reflection-based XmlSerializer - Deserialize empty collections and allow for sub-types in collection items. (dotnet#111723) JIT: Use `fgCalledCount` for OSR method entry weight (dotnet#112662) Use Avx10.2 Instructions in Floating Point Conversions (dotnet#111775) Expose StressLog via CDAC and port StressLogAnalyzer to managed code (dotnet#104999) JIT: Use linear block order for MinOpts in LSRA (dotnet#108147) Update dependencies from https://github.com/dotnet/arcade build 20250213.2 (dotnet#112625) JIT: Clean up and optimize call arg lowering (dotnet#112639) Update dependencies from https://github.com/dotnet/emsdk build 20250217.1 (dotnet#112645) JIT: Support `FIELD_LIST` for returns (dotnet#112308) ...
Follow-up from #108086 (comment). Since LSRA shouldn't create cross-block live registers in MinOpts, LSRA's block ordering shouldn't matter, so just use the lexical order we have on-hand.
cc @dotnet/jit-contrib