Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: fix case where RBO leads to an invalid CSE #88159

Merged
merged 3 commits into from
Jun 29, 2023

Conversation

AndyAyersMS
Copy link
Member

If phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find invalid memory-based CSEs. An example of this is seen in the attached test case.

Ideally perhaps CSE would kill availability of these CSEs at the point where memory can change, but that seems difficult to arrange. Instead, we mark the bypased block as one that will not propagate any incoming CSEs, as the failures we know of require CSEs to flow back through this block.

Fixes #88091.

If phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find
invalid memory-based CSEs. An example of this is seen in the attached test case.

Ideally perhaps CSE would kill availability of these CSEs at the point where memory
can change, but that seems difficult to arrange. Instead, we mark the bypased block
as one that will not propagate any incoming CSEs, as the failures we know of require
CSEs to flow back through this block.

Fixes dotnet#88091.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 28, 2023
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 28, 2023

@dotnet/jit-contrib PTAL

Consider this as an interim fix, I would like to get something plausible in now and perhaps find a better fix down the road.

Not sure who to best tag for a review, so am going to arbitrarily pick @jakobbotsch, but happy for anone else to weigh in too.

Should be fairly minimal diffs, either code size increases from lack of costly CSEs, or code size decreases from lack of cheap CSEs leading to less prolog/epilog code.

Diffs

Small size increase on arm64, smaller decrease on x64.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable surgical fix to me.

@ghost ghost assigned AndyAyersMS Jun 28, 2023
@ghost
Copy link

ghost commented Jun 28, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

If phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find invalid memory-based CSEs. An example of this is seen in the attached test case.

Ideally perhaps CSE would kill availability of these CSEs at the point where memory can change, but that seems difficult to arrange. Instead, we mark the bypased block as one that will not propagate any incoming CSEs, as the failures we know of require CSEs to flow back through this block.

Fixes #88091.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines 21 to 35
static int Main()
{
int result = 0;
try
{
Problem(data);
result = 100;
}
catch (Exception e)
{
Console.WriteLine($"Failed: {e.Message}");
result = -1;
}
return result;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this not Main (convention has been TestEntryPoint but it isn't semantically meaningful) and mark it with [Fact] (using Xunit;)

Optionally (not recommended here), instead make Problem a [Theory] and figure out ClassData for the input.

Optionally (recommend), make the entire method body Problem(data). The infrastructure will handle exceptions and the 100 isn't required at that point (can just return void)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like so?

using XUnit;

class Runtime_88091
{
    [Fact] static int Test() => Problem(data);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but (I missed this the first time) also make it (the method -and- class) public.

To verify type "88091" in the test tab of the azdo job for the PR and see if it finds the test. It won't if it isn't public, doesn't have [Fact], etc. There are a few checks in msbuild and the analyzers, but there are incomplete and I'm hoping to improve them so that you get direct feedback instead of the current situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infrastructure will handle exceptions

As far as I've seen in #88006, it does mark the test as failed but it doesn't print the exception in the log in such case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markples azdo still doesn't show the test has been run. Anything obviously wrong to you? If not I will probably merge to get the fix in and we can sort it out later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can fix the exception reporting.

The test is actually running. There seems to be something strange with the azdo test filter where it doesn't show up for a while. I don't know if it is time-based or if it's based on earlier queries. (I also didn't see it but then after looking for "JIT", "Regression", etc., it started showing up. I also see it in the helix log.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see it now. Thanks!

image

@@ -1588,6 +1595,29 @@ bool Compiler::optJumpThreadCore(JumpThreadInfo& jti)
}
}

// If this is a phi-based threading, and the block we're bypassing has
// a memory phi, and the new successors do not, mark the block with BBF_ALTERED_MEMORY_PHI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing BBF_ALTERED_MEMORY_PHI was an earlier name of BBF_NO_CSE_IN?

Which I suppose ties in to my other question: technically this only needs to kill memory CSEs, is that correct? Presumably this is just simpler for the interim fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah., thanks for spotting that.

Yes, it could just kill memory CSEs, but it is actually tricky to figure out if a VN is memory dependent. If we could do that then we could instead kill memory CSEs in blocks with "memory havoc" and fix the bug a bit more surgically.

@AndyAyersMS AndyAyersMS merged commit bba7a9c into dotnet:main Jun 29, 2023
132 of 135 checks passed
@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone Jul 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Program throws ArgumentOutOfRangeException in Release
4 participants