Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid fgMorphCommutative during optValnumCSE_phase #57160

Merged
merged 6 commits into from
Aug 17, 2021

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Aug 10, 2021

During fgMorphCommutative we don't update a VN of modified trees properly and, as a result, in #56935 the JIT made wrong decision during optVNAssertionPropCurStmt. (You can find out what exactly happens in the comments for the regression test).

The fix is to not allow running fgMorphCommutative during optValnumCSE_phase (as we do for similar optimizations).
The change is conservative and results in some regressions (I will post the results later), but there nothing major as far as I can tell, so I am open for suggestions whether we can do anything that is better and still safe.

@dotnet/jit-contrib PTAL

Fixes #56935

@echesakov echesakov added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 10, 2021
@echesakov echesakov added this to the 6.0.0 milestone Aug 10, 2021
@echesakov echesakov self-assigned this Aug 10, 2021
@echesakov echesakov changed the title Fixes #56935 Forbid fgMorphCommutative during optValnumCSE_phase Aug 11, 2021
@echesakov echesakov marked this pull request as ready for review August 11, 2021 03:31
@EgorBo
Copy link
Member

EgorBo commented Aug 11, 2021

I am open for suggestions whether we can do anything that is better and still safe.

A better fix is to check nodes for gtIsActiveCSE_Candidate before folding them I guess.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Aug 11, 2021

I believe it is an invariant in the optimizer that the morphing functions must always preserve VNs, otherwise we risk getting very hard to catch bugs in (currently, only) RangeCheck.

I think this transform can actually preserve VNs without much difficulty, if it is restructured a little.

Right now it does this:

//      tree                op1
//     /    \            /       \
//   op1   cns2  -->   op   (cns1, cns2)
//  /   \                  bashed from cns1
// op  cns1

Which leaves VNs of both op1 and the "bashed by hands" cns1 incorrect.

It should be possible to make it do this:

//      tree               tree
//     /    \            /      \
//   op1   cns2  -->   op  (cns1, cns2)
//  /   \                 bashed from op1
// op  cns1

Bashing op1 via gtFoldExprConst, which already recomputes the VN for the tree it produces (and appends the field sequences too). This will leave the tree's VN intact and avoid creating a node just to throw it away immediately.

Note: the current code fails to account for cns1, cns2 or tree being CSE candidates, so I think not running it during CSE is still a good idea, however, if you want to avoid regressions (there are some due to the usual morph ordering issues, there is also one unfortunate case where the lea multiplication decomposition interferes) gtIsActiveCSE_Candidate would indeed be the mechanism to use (however much I would like that not to be the case)...

@echesakov
Copy link
Contributor Author

@SingleAccretion Thank you for detailed response!
I was originally planning to do what @EgorBo suggested - make sure that none of the trees that are under transformation IsActiveCSE_Candidate, but I am gonna try to preserve VNs during this optimization.

@JulieLeeMSFT JulieLeeMSFT requested a review from EgorBo August 16, 2021 04:15
@JulieLeeMSFT
Copy link
Member

@EgorBo please review the code to merge this before RC1.

@EgorBo
Copy link
Member

EgorBo commented Aug 16, 2021

@EgorBo please review the code to merge this before RC1.

From my understanding @echesakovMSFT is working on a less conservative fix (?)

1. Maintain VNs of op1 and cns1
2. Use less conservative check for gtIsActiveCSE_Candidate() in fgMorphCommutative()
@echesakov
Copy link
Contributor Author

This is now "no-diffs" change.

Note that with the new change folding of (ADD (CNS_INT 1) (CNS_INT 1)) in the regression test case (that corresponds to (oper cns1 cns2) in morph.cpp) is allowed, but the JIT tries to maintain VNs of the modified trees:

Before transformation:

N013 ( 34, 29) [000011] -A-XG-------              *  ASG       int    $VN.Void
N002 (  6, 14) [000010] n---G--N----              +--*  IND       int    $42
N001 (  3, 12) [000020] H-----------              |  \--*  CNS_INT(h) long   0x7ffb1e19909c static Fseq[clsFld] $c0
N012 ( 27, 14) [000009] ---X--------              \--*  ADD       int    $42
N010 ( 25, 11) [000029] ---X--------                 +--*  ADD       int    $40
N008 ( 23,  8) [000032] ---X--------                 |  +--*  NEG       int    $41
N007 ( 22,  7) [000008] ---X--------                 |  |  \--*  DIV       int    $42
N003 (  1,  2) [000004] ------------                 |  |     +--*  CNS_INT   int    1 $42
N006 (  1,  2) [000024] ------------                 |  |     \--*  COMMA     int    $42
N004 (  0,  0) [000022] ------------                 |  |        +--*  NOP       void   $100
N005 (  1,  2) [000023] ------------                 |  |        \--*  CNS_INT   int    1 $42
N009 (  1,  2) [000028] ------------                 |  \--*  CNS_INT   int    1 $42
N011 (  1,  2) [000003] ------------                 \--*  CNS_INT   int    1 $42

After transformation:

Folding operator with constant nodes into a constant:
                            [000038] ------------              *  ADD       int   
N009 (  1,  2)              [000028] ------------              +--*  CNS_INT   int    1 $42
N011 (  1,  2)              [000003] ------------              \--*  CNS_INT   int    1 $42
Bashed to int constant:
                            [000038] ------------              *  CNS_INT   int    2 $45
optValnumCSE morphed tree:
N015 ( 33, 27)              [000011] -A-XG-------              *  ASG       int    $VN.Void
N006 (  7, 15)              [000010] nA--G--N----              +--*  IND       int    $42
N005 (  4, 13)              [000037] -A----------              |  \--*  COMMA     long   $c0
N003 (  3, 12)              [000035] -A------R---              |     +--*  ASG       long   $VN.Void
N002 (  1,  1)              [000034] D------N----              |     |  +--*  LCL_VAR   long   V03 cse0         d:1 $c0
N001 (  3, 12)              [000020] H-----------              |     |  \--*  CNS_INT(h) long   0x7ffb1e19909c static Fseq[clsFld] $c0
N004 (  1,  1)              [000036] ------------              |     \--*  LCL_VAR   long   V03 cse0         u:1 $c0
N014 ( 25, 11)              [000029] ---X--------              \--*  ADD       int    $42
N012 ( 23,  8)              [000032] ---X--------                 +--*  NEG       int    $41
N011 ( 22,  7)              [000008] ---X--------                 |  \--*  DIV       int    $42
N007 (  1,  2)              [000004] ------------                 |     +--*  CNS_INT   int    1 $42
N010 (  1,  2)              [000024] ------------                 |     \--*  COMMA     int    $42
N008 (  0,  0)              [000022] ------------                 |        +--*  NOP       void   $100
N009 (  1,  2)              [000023] ------------                 |        \--*  CNS_INT   int    1 $42
N013 (  1,  2)              [000028] ------------                 \--*  CNS_INT   int    2 $45

@SingleAccretion
Copy link
Contributor

As much as I would like to say that the change looks good, there are still problems.

  1. tree needs to be !gtIsActiveCSE_Candidate as well.
  2. All the COMMAs on the left side need to be not candidates as well (but see better suggestion below).
  3. VNs will not be updated correctly for the left side in the presence of COMMAs, only the top-level node will be updated:
tree $1             COMMA $1
  COMMA $2            COMMA $2
    COMMA $2            op1 $2
      op1 $2    -->       op $3         
        op $3             cns1 $4+$5
        cns1 $4     
  cns2 $5
     
// But we need:
COMMA $1
  COMMA $1
    op1 $1
      op $3     
      cns1 $4+$5

Or so one would think - what we actually need is complicated by the fact that COMMA's have side effects and thus add exception sets. So what actually has to happen is that we need to update the "normal" parts of the VN for the COMMA's, all the while op1's VN will have to exclude the exceptions coming only from the COMMA nodes (they may come from children too). Which is totally not worth the complexity, so I suggest disabling this transform if we're not in the global morphing phase and COMMA's are involved.

Sorry for not bringing up the COMMA issues earlier, it is not very easy to reason about morphing and VN for me at times (and easy to forget about things like exception sets).

@EgorBo
Copy link
Member

EgorBo commented Aug 16, 2021

  1. tree needs to be !gtIsActiveCSE_Candidate as well.

I agree, but not 100% sure it's necessary if we return something with original tree's VN in the end.

Good points, @SingleAccretion. gtEffectiveVal skips GT_COMMA (can be more than one) so those most likely have to checked against CSECandidate too (or, like you suggested - if (tree->gtGetOp1()->OperIs(GT_COMMA) && !fgGlobalMorph) return nullptr; in case if it doesn't produce regressions)

Also, notes about op1->SetVNsFromNode(tree); being not COMMA friendly look reasonable...

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Aug 16, 2021

I agree, but not 100% sure it's necessary if we return something with original tree's VN in the end.

Well, if we are going to delete the tree from IR, we need to make sure it is not a CSE candidate, otherwise all the familiar asserts about CSE failing to find the node will fail (or rather wait for someone to find a test case which exposes them...). CSE caches GenTree* pointers after all.

Edit: I also note that the concerns about CSE and preserving VNs are somewhat orthogonal. It is obvious we cannot change VNs of CSE candidates, or delete them. Effects of failing to update VNs, including the exception sets, (which must be done always regardless of CSE) are much more subtle by nature and are rooted in the implicit assumptions in the optimizer that the morphing functions do update the VNs properly.

@briansull
Copy link
Contributor

Shouldn't we just go with the original fix:

The fix is to not allow running fgMorphCommutative during optValnumCSE_phase (as we do for similar optimizations).

When tree.op1 is complex - don't run the optimization outside of global morph.
@echesakov
Copy link
Contributor Author

@SingleAccretion @EgorBo Thank you for your feedback! I agree, introducing COMMA handling would complicate the optimization too much. I pushed a second improvement where we check for complexity of tree.op1 and disallow the optimization for complex tree outside of global morph.

@echesakov
Copy link
Contributor Author

Shouldn't we just go with the original fix:

The fix is to not allow running fgMorphCommutative during optValnumCSE_phase (as we do for similar optimizations).

@briansull I could go with the original solution a week ago (and we still can do it now if we think it is a better choice) but I've been trying to avoid introducing new regressions by stopping doing the optimization abruptly during VN/CSE phase.

@EgorBo
Copy link
Member

EgorBo commented Aug 17, 2021

LGTM assuming CI failures are unrelated

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Two small nits, feel free to ignore.

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
@echesakov
Copy link
Contributor Author

Thanks for review! I applied the first nit, the second one, unfortunately, would cause regressions (I tried this last week) - I added comment explaining why we need this handling of fieldSeq.

@echesakov
Copy link
Contributor Author

echesakov commented Aug 17, 2021

runtime (Libraries Test Run checked coreclr windows x86 Release) failure is due to #57471

@echesakov
Copy link
Contributor Author

echesakov commented Aug 17, 2021

I am going to ignore runtime (Mono minijit Pri0 Runtime Tests Run OSX x64 release) and runtime (Mono monointerpreter Pri0 Runtime Tests Run OSX x64 release) since they are triggered only because I added a test and were passing on a previous attempt.

EDIT: All the test legs except runtime (Libraries Test Run checked coreclr windows x86 Release) passed on a second attempt (see #20210817.55)

@echesakov echesakov merged commit 52ae8c3 into dotnet:main Aug 17, 2021
@echesakov echesakov deleted the Runtime_56935 branch August 17, 2021 18:41
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM64 incorrect result when expression involves modulo by 1
5 participants