Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Generalize assignment decomposition in physical promotion #85323

Merged
merged 43 commits into from
May 6, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Apr 25, 2023

Generalize assignment decomposition to handle arbitrary combinations of physically and regularly promoted structs. Do this by introducing a DecompositionPlan class that keeps track of the copies to do that involve replacement fields. The first step is then to fill out this plan. In the general case where both the source and destination are physically promoted this involves iterating the replacements in lockstep. For promotions that map exactly, a direct copy between their locals is queued into the plan; in other cases (e.g. partial overlap) it may involve writing the source back to the struct local.

The plan is used to generate the IR and to compute the remainder that needs to be handled after all replacements have been handled. Example:

Example
V03 promoted with 1 replacements
  [016..020) promoted as int V05
...
V04 promoted with 1 replacements
  [008..012) promoted as int V06
...
STMT00011 ( 0x00C[--] ... ??? )
               [000052] -A---------ASG       struct (copy)
               [000051] D------N---                         ├──▌  LCL_VAR   struct<S, 24> V04 tmp2         
               [000008] -----------                         └──▌  LCL_VAR   struct<S, 24> V03 tmp1          (last use)
Processing block operation [000052] that involves replacements
  V06 (V04.[008..012)) <- src+008
  dst+016 <- V05 (V03.[016..020))
  Remainder: [000..008) [012..016) [020..024)
  => remainder strategy: retain a full block op

Local V03 should not be enregistered because: was accessed as a local field

Local V04 should not be enregistered because: was accessed as a local field
New statement:
STMT00011 ( 0x00C[--] ... ??? )
               [000064] -A---------COMMA     void  
               [000056] -A---------                         ├──▌  ASG       struct (copy)
               [000051] D------N---                         │  ├──▌  LCL_VAR   struct<S, 24> V04 tmp2         
               [000008] -----------                         │  └──▌  LCL_VAR   struct<S, 24> V03 tmp1         
               [000063] -A---------                         └──▌  COMMA     void  
               [000059] -A---------                            ├──▌  ASG       int   
               [000057] D------N---                            │  ├──▌  LCL_VAR   int    V06 tmp4         
               [000058] -----------                            │  └──▌  LCL_FLD   int    V03 tmp1         [+8]
               [000062] -A---------                            └──▌  ASG       int   
               [000060] U------N---                               ├──▌  LCL_FLD   int    V04 tmp2         [+16]
               [000061] -----------                               └──▌  LCL_VAR   int    V05 tmp3         

(Also unnest StatementList and rename it to DecompositionStatementList as the other name conflicted with another class.)

Generalize assignment decomposition to handle arbitrary combinations of
physically promoted structs. Do this by introducing a DecompositionPlan
class that keeps track of the copies to do that involve replacement
fields. The first step is then to fill out this plan. In the general
case where both the source and destination are physically promoted this
involves iterating the replacements in lockstep. For promotions that map
exactly, a direct copy between their locals is queued into the plan; in
other cases (e.g. partial overlap) it may involve writing the source
back to the struct local.

The plan is used to generate the IR and to figure out whether the writes
of all the replacement fields covers the destination such that the
original struct copy can be omitted.

Also unnest StatementList and rename it to DecompositionStatementList as
the other name conflicted with another class.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 25, 2023
@ghost ghost assigned jakobbotsch Apr 25, 2023
@ghost
Copy link

ghost commented Apr 25, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Generalize assignment decomposition to handle arbitrary combinations of physically promoted structs. Do this by introducing a DecompositionPlan class that keeps track of the copies to do that involve replacement fields. The first step is then to fill out this plan. In the general case where both the source and destination are physically promoted this involves iterating the replacements in lockstep. For promotions that map exactly, a direct copy between their locals is queued into the plan; in other cases (e.g. partial overlap) it may involve writing the source back to the struct local.

The plan is used to generate the IR and to figure out whether the writes of all the replacement fields covers the destination such that the original struct copy can be omitted.

Also unnest StatementList and rename it to DecompositionStatementList as the other name conflicted with another class.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

/azp run runtime-jit-experimental, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

The JIT has a flag GTF_ICON_INITCLASS that represents that accesses off
that address are cctor dependent. Hoisting uses this to avoid hoisting
cctor dependent indirections unless all cctors are also hoisted.
However, local constant prop/VN-based constant prop do not handle this
flag, so we could run into cases where addresses with GTF_ICON_INITCLASS
were propagated and then subsequently hoisted incorrectly.

This change moves the flag to an OperIsIndir() flag instead of being a
flag on the constant. After some digging, I found that the original
reason the flag was not an indir flag was simply that there were no more
indir flags available, but we do have available flags today. This fix
is much simpler than the alternatives which would be to teach VN/local
copy prop to propagate this GTF_ICON_INITCLASS flag.

Also remove GTF_FLD_INITCLASS which is never set.
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Comment on lines +1024 to +1025
if (m_compiler->lvaGetDesc(entry.ToLclNum)->lvIsStructField)
UpdateEarlyRefCount(m_compiler, dst);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit odd right now because I'm trying to avoid adding any ref counts to the replacement locals created by this pass. So this logic tracks them correctly only for existing locals and relies on forward sub to give up on the physical promotion locals that have 0 ref count.

I will submit a separate PR after this one to correctly track the ref counts much more uniformly for all the IR created by this pass.

@@ -24,13 +25,1181 @@ class DecompositionStatementList

for (GenTree* cur = m_head->gtNext; cur != nullptr; cur = cur->gtNext)
{
tree = comp->gtNewOperNode(GT_COMMA, tree->TypeGet(), cur, tree);
tree = comp->gtNewOperNode(GT_COMMA, TYP_VOID, cur, tree);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Physical promotion counterpart to #85320.

@jakobbotsch
Copy link
Member Author

The failure is #83658.

cc @dotnet/jit-contrib PTAL @AndyAyersMS... if some of the structure looks a bit unfamiliar it's because I refactored it in #85728.

Diffs with physical promotion enabled.

Diffs with physical promotion enabled and old promotion disabled. I analyzed the regressions in #85323 (comment), there is still quite a lot of potential improvements to be had.

@jakobbotsch jakobbotsch marked this pull request as ready for review May 4, 2023 14:49
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

primitiveType = TYP_INT;
break;
#endif
case TARGET_POINTER_SIZE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you should peel this case out and check for it up front, then just do power of 2 sized case analysis beyond that.

Does alignment matter if there are no GC refs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you should peel this case out and check for it up front, then just do power of 2 sized case analysis beyond that.

Sure, I can do that.

Does alignment matter if there are no GC refs?

AFAIK no, as of recently we do the same trick in block lowering for odd sizes by copying [start..start+8) and [end-8..end).

// If the remainder is a full block and is going to incur write barrier
// then avoid incurring multiple write barriers for each source
// replacement that is a GC pointer -- write them back to the struct
// first instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of these strategy choices might benefit from having pseudo-IR "pictures" showing what we're trying to avoid/create.

Either here in comments or perhaps in some companion write-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment on this specific optimization. For completeness:

        // If the remainder is a full block and is going to incur write barrier
        // then avoid incurring multiple write barriers for each source
        // replacement that is a GC pointer -- write them back to the struct
        // first instead. That is, instead of:
        //
        //   ▌  COMMA     void  
        //   ├──▌  ASG       struct (copy)                      <- write barrier
        //   │  ├──▌  BLK       struct<Program+S, 32>
        //   │  │  └──▌  LCL_VAR   byref  V01 arg1         
        //   │  └──▌  LCL_VAR   struct<Program+S, 32> V00 arg0         
        //   └──▌  COMMA     void  
        //      ├──▌  ASG       ref                             <- write barrier
        //      │  ├──▌  IND       ref   
        //      │  │  └──▌  ADD       byref 
        //      │  │     ├──▌  LCL_VAR   byref  V01 arg1         
        //      │  │     └──▌  CNS_INT   long   8
        //      │  └──▌  LCL_VAR   ref    V05 tmp3         
        //      └──▌  ASG       ref                             <- write barrier
        //         ├──▌  IND       ref   
        //         │  └──▌  ADD       byref 
        //         │     ├──▌  LCL_VAR   byref  V01 arg1         
        //         │     └──▌  CNS_INT   long   24
        //         └──▌  LCL_VAR   ref    V06 tmp4         
        //
        // Produce:
        // 
        //   ▌  COMMA     void  
        //   ├──▌  ASG       ref                                <- no write barrier
        //   │  ├──▌  LCL_FLD   ref    V00 arg0         [+8]
        //   │  └──▌  LCL_VAR   ref    V05 tmp3         
        //   └──▌  COMMA     void  
        //      ├──▌  ASG       ref                             <- no write barrier
        //      │  ├──▌  LCL_FLD   ref    V00 arg0         [+24] 
        //      │  └──▌  LCL_VAR   ref    V06 tmp4         
        //      └──▌  ASG       struct (copy)                   <- write barrier
        //         ├──▌  BLK       struct<Program+S, 32>
        //         │  └──▌  LCL_VAR   byref  V01 arg1          (last use)
        //         └──▌  LCL_VAR   struct<Program+S, 32> V00 arg0         
        //

}
}

if (remainderStrategy.Type != RemainderStrategy::NoRemainder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if you should represent the number of additional address uses as part of the remainder strategy?

Say someday you decided to support two primitives... I suppose there are enough other changes needed that perhaps this one would not be overlooked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would probably be reasonable to try to make this more polymorphic, but let me leave it as is for now and see how (if) it ends up abstracting/generalizing in the future to more sophisticated strategies.

}

//------------------------------------------------------------------------
// CreateInitValue:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this belongs over in gentree.cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Compiler::gtNewConWithPattern and made regular block morphing use it too.

Comment on lines +923 to +924
if (addr->OperIsLocal() && (!m_dst->OperIs(GT_LCL_VAR, GT_LCL_FLD) ||
(addr->AsLclVarCommon()->GetLclNum() != m_dst->AsLclVarCommon()->GetLclNum())))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing address exposed check, but it seems to be preexisting so let me fix it in a follow-up.

@jakobbotsch jakobbotsch merged commit 40cad3c into dotnet:main May 6, 2023
@jakobbotsch jakobbotsch deleted the more-general-decomposition branch May 6, 2023 10:06
@ghost ghost locked as resolved and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants