Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: stop computing dom start nodes for morph RPO #94497

Merged
merged 5 commits into from
Nov 10, 2023

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Nov 7, 2023

Remove the up-front computation of enter blocks and dom
start nodes from the DFS computations used for RPO.

Also lay the groundwork for removing unreachable blocks, by tracking
the postorder number of the last reachable block.

We don't need quite this broad of a start node set, as the DFS will
find unreachable blocks on its own.

Also lay the groundwork for removing unreachable blocks, by tracking
the postorder number of the last reachable block.

Contributes to dotnet#93246.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 7, 2023
@ghost ghost assigned AndyAyersMS Nov 7, 2023
@ghost
Copy link

ghost commented Nov 7, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We don't need quite this broad of a start node set, as the DFS will find unreachable blocks on its own.

Also lay the groundwork for removing unreachable blocks, by tracking the postorder number of the last reachable block.

Contributes to #93246.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@jakobbotsch PTAL
cc @dotnet/jit-contrib

No diffs, modest (~0.05%) TP win (enabling RPO cost about 0.25%, so this gets back a chunk of that).

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine to me, but why not improve fgDfsReversePostorder for everyone? It confuses me a little that morph wants different behavior from everyone else.

@AndyAyersMS
Copy link
Member Author

It looks fine to me, but why not improve fgDfsReversePostorder for everyone? It confuses me a little that morph wants different behavior from everyone else.

We use this for synthesis, reachability, and dominance. The first two probably can change over.

For dominance it may make sense to try and pick additional start blocks that are clearly dom tree roots—but let me see what happens if we don't.

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 8, 2023

We use this for synthesis, reachability, and dominance. The first two probably can change over.

For dominance it may make sense to try and pick additional start blocks that are clearly dom tree roots—but let me see what happens if we don't.

Well, given that we already loop over all the basic blocks in fgDfsReversePostorderCore, it seems very easy to get the same behavior for everyone at practically no cost. Can't we just check for block->bbRefs == 0 and launch another DFS in those cases? I don't see why that doesn't cover what fgDomFindStartBlocks is trying to accomplish.

To get the most possible benefits, I would rewrite the entire function like this:

fgDfsReversePostorderHelper(fgFirstBB, ...);
for (EHblkDsc* clause : EHClauses(this))
{
  // visit handler
}

fgDfsLastReachablePostorderNumber = postorderIndex;
if (preorderIndex != fgBBcount + 1)
{
  // rare cases with unreachable blocks and isolated cycles
  for (BasicBlock* bb : Blocks())
  {
    if (bb->bbRefs == 0)
      // visit unreachable enter block
  }
  for (BasicBlock* bb : Blocks())
  {
    if (!visited(bb))
      // visit isolated cycle
  }
}

IIUC it should have the same benefit as this PR, maintain the same behavior as previously for everyone, and avoid the full basic block scan in common cases.

(Thinking about it a little more, I'm not even sure the separate loop checking bbRefs == 0 is necessary.)

@AndyAyersMS
Copy link
Member Author

Yeah, that makes sense. Let me see what not finding the "optimal" start point for the unvisited blocks does for dominance... hopefully it does not matter.

It would also be nice if we could use bit vectors (or their complements) for worklists, then we wouldn't have to search the entire block list for unvisited blocks, but I don't think we're set up to do that quite yet. We'd also need a map from bbNum to Block*.

@AndyAyersMS
Copy link
Member Author

@jakobbotsch looks like indeed none of it was needed, and now this more or less erases the TP cost from the RPO.

Enter blocks is still used in a few places (reachability and dominance) so we now have a couple similar looking computations between that, the "inlined" version here in the DFS, and another one in the dead blocks code -- it might be nice to figure out how to have just one that can meet all needs. But I'll save that for another day.

@AndyAyersMS
Copy link
Member Author

Still something broken with a few arm tests

No Diffs otherwise, and a TP win.

CI compiler seems to have been updated.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Seems like there's an arm32 failure.

Looks like there is a single diff in x86. Presumably that's because we may start the DFS from handler blocks in a different order than before.

@AndyAyersMS AndyAyersMS merged commit 1ff715f into dotnet:main Nov 10, 2023
126 of 129 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants