Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Add an "init BB" invariant #110404

Merged
merged 12 commits into from
Dec 13, 2024
Merged

JIT: Add an "init BB" invariant #110404

merged 12 commits into from
Dec 13, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Dec 4, 2024

This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties:

  • It is only executed once, so it does not have any predecessors
  • It is not inside a try region, hence it dominates all other blocks in the function
  • (For debug code only) it is an internal block (not corresponding to any user IL)

There are no further requirements on the BB. The init BB does not have to be BBJ_ALWAYS (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.

This adds an invariant that there always exists an "init BB" throughout
the JIT's phases. The init BB has the following properties:
- It is only executed once, so it does not have any predecessors
- It is not inside a try region, hence it dominates all other blocks in
  the function

There are no further requirements on the BB. The init BB does not have
to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly
because it philosophically does not make sense to insert IR at the end
of the init BB, since earlier phases can have inserted arbitrary IR in
them.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2024
@AndyAyersMS
Copy link
Member

@jakobbotsch turns out I also need something like this so I can compute dominators early.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Dec 10, 2024

I really want to get this in too, and have been investigating the diffs on and off. A large number of them come from loop inversion that now matches the pattern in more cases -- seems like the loop being the entry is blocking inversion.

Even with loop inversion disabled in both the base and diff there are a fair amount of diffs. Almost all of these are in OSR functions. If I filter out diffs in OSR functions then the diffs of this PR look like https://gist.github.com/jakobbotsch/6f5907231d7999edb7de2598df36fe05.

There are a large amount of zero sized diffs. Likely that's GC info changes. I need to look into that and ensure we aren't inflating the size of GC info unnecessarily. Could also be debug info changes.

One of the causes of diffs is this code in LSRA:

// This is the case when we have a conditional branch where one target has already
// been visited. It would be best to use the same incoming regs as that block,
// so that we have less likelihood of having to move registers.
// For example, in determining the block to use for the starting register locations for
// "block" in the following example, we'd like to use the same predecessor for "block"
// as for "otherBlock", so that both successors of predBlock have the same locations, reducing
// the likelihood of needing a split block on a backedge:
//
// otherPred
// |
// otherBlock <-+
// . . . |
// |
// predBlock----+
// |
// block
//
if (blockInfo[otherBlock->bbNum].hasEHBoundaryIn)
{
return nullptr;
}
else
{
for (BasicBlock* const otherPred : otherBlock->PredBlocks())
{
if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum)
{
predBlock = otherPred;
break;
}
}
}

With the init block, we sometimes end up picking that init block as the pred of some loop exits. Previously we would just pick the loop pred as the pred.

Maybe an acceptable solution is to just remove the init block in the backend, if it is empty at that point.

@AndyAyersMS
Copy link
Member

OSR diffs are likely less interesting, so I think its ok to focus on the others.

That stretch of LSRA logic always has seemed a bit fishy to me. Maybe it's just the inherent clunkiness of the linear world view. But seems like there might be better heuristics, esp with PGO.

@jakobbotsch
Copy link
Member Author

The zero diffs had to do with reusing the first non-internal BB as the init BB; that would mean insertion of JIT code (like calls to CORINFO_HELP_DBG_IS_JUST_MY_CODE) in a BB marked with an IL range, which would change some IL/var mappings reported for debug code.
I've added another requirement to the init BB: when generating debug-friendly code, it must be an internal block.

I think the codegen diffs are acceptable now to take as is. From spot checking the main causes of the diffs seem to be:

  • In OSR functions we frequently hit the case I described above, where LSRA picks another pred for some loop exits. I looked a bit further at this and the code seems well-intentioned and is meant to reduce number of split backedges, so even with larger code size this might actually be beneficial
  • Loop inversion and early jump-threading now kicks in more often (again, larger code size here is not necessarily bad)
  • We sometimes see different block layouts

Here are diffs with loop inversion disabled, and with diffs in OSR functions filtered out.

I'm going to kick off some stress legs.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

Looks like tests aren't completely happy:

Assert failure(PID 5932 [0x0000172c], Thread: 6412 [0x190c]): Assertion failed '!block->HasFlag(BBF_PATCHPOINT)' in 'System.AppContext:Setup(ulong,ulong,int)' during 'Expand patchpoints' (IL size 86; hash 0x16ec513f; Instrumented Tier0)

{
compiler->fgEnsureFirstBBisScratch();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep this or its moral equivalent, Tier0 instr needs to insert some execute once code at method entry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've thought the init BB would be sufficient. It should have been created before this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the issue is that the bbNum "is backedge" approximation is placing patchpoints in the init BB. Seems like we can teach the logic to never place a patchpoint in the init BB (or, more ideally, switch to a DFS tree based approach – but that's maybe too expensive for tier0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importer places the patchpoint in the init BB because of stress.

What is the point of the assert? Is it simply used as an attempt to verify "is not part of a cycle"? Seems like the assert should just be removed if so, given the stress mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing the assert and also changing code below to make sure to insert the initialization of the patchpoint at the beginning of the init BB.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr pgo, runtime-coreclr pgostress

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr superpmi-replay

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review December 12, 2024 17:56
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. See above for some notes on these. Stress failures look unrelated.

{
firstILBlock = firstILBlock->Next();
}
firstILBlock = fgGetFirstILBlock();
Copy link
Member Author

@jakobbotsch jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to submit a change to compute fgCalledCount earlier before I merge this PR. fgGetFirstILBlock added by this PR will follow BBJ_ALWAYS to find the first IL block, but it seems dangerous to have a requirement that we can find it that way this late (after morph).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to revert this back to what it was to get this PR in quicker. Can look at cleaning this up separately.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, it is something I've thought about doing for a while but never got around to it.

// Create a block for the start of the try region, where the monitor enter call
// will go.
BasicBlock* const tryBegBB = fgSplitBlockAtEnd(fgFirstBB);
BasicBlock* const tryBegBB = fgSplitBlockAtBeginning(fgFirstBB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split at the beginning, we may put already jit-inserted IR into the try. I suppose that's ok as none of it should ever cause an exception.

Copy link
Member Author

@jakobbotsch jakobbotsch Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it looks a bit odd, but I think it's preferable over a solution that tries to differentiate between IR created because of the user IL and IR created due to the JIT. I really want us to be able to freely move IR and merge/duplicate blocks without breaking subtle invariants about what kind of IR can go where.

So far it seems like most of these cases come down to careful consideration of phase ordering. For example, this try-finally insertion is very much like IR created because of IL, so it would make sense to do it in (or close to) import. Another example is reverse pinvoke enter/exit – most of the JIT will naturally assume inserted IR is executed in cooperative mode, so this one makes sense to insert as the last thing in the JIT. Currently it's inserted quite early.

@jakobbotsch jakobbotsch merged commit 5a9362f into dotnet:main Dec 13, 2024
103 of 108 checks passed
@jakobbotsch jakobbotsch deleted the init-bb branch December 13, 2024 13:28
hez2010 pushed a commit to hez2010/runtime that referenced this pull request Dec 14, 2024
This adds an invariant that there always exists an "init BB" throughout
the JIT's phases. The init BB has the following properties:
- It is only executed once, so it does not have any predecessors
- It is not inside a try region, hence it dominates all other blocks in
  the function

There are no further requirements on the BB. The init BB does not have
to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly
because it philosophically does not make sense to insert IR at the end
of the init BB, since earlier phases can have inserted arbitrary IR in
them.
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants