Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IR support for modeling multiple SSA definitions as one store #76139

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Sep 24, 2022

Up until this change, it has been the case that an SSA ref had to have 1-to-1 relationship with the node it was attached to (with the exception of CanBeReplacedWithItsField structs). This was a problem for modeling assignments to promoted structs that are not decomposable, i. e. made by calls (multi-reg or not). The solution has been to exclude the promoted fields from SSA and VN, which is a CQ problem.

This change solves this by allowing for encoding the SSA number that would represent multiple field definitions at once. The code comment describing the format is quite nice, I will copy it here:

// Encapsulates the SSA info carried by local nodes. Most local nodes have simple 1-to-1
// relationships with their SSA refs. However, defs of promoted structs can represent
// many SSA defs at the same time, and we need to efficiently encode that.
//
class SsaNumInfo final
{
    // This can be in one of four states:
    //  1. Single SSA name: > RESERVED_SSA_NUM (0).
    //  2. RESERVED_SSA_NUM (0)
    //  3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED):
    //     [byte 3]: [top bit][ssa num 3] (7 bits)
    //     [byte 2]: [ssa num 2] (8 bits)
    //     [byte 1]: [compact encoding bit][ssa num 1] (7 bits)
    //     [byte 0]: [ssa num 0] (8 bits)
    //     We expect this encoding to cover the 99%+ case of composite names: locals with more
    //     than 127 defs, maximum for this encoding, are rare, and the current limit on the count
    //     of promoted fields is 4.
    //  4. "Outlined composite name": index into the "composite SSA nums" table. The table itself
    //     will have the very simple format of N (the total number of fields / simple names) slots
    //     with full SSA numbers, starting at the encoded index. Notably, the table entries will
    //     include "empty" slots (for untracked fields), as we don't expect to use the table in
    //     the common case, and in the pathological cases, the space overhead should be mitigated
    //     by the cap on the number of tracked locals.
    //

The rest of changes are fairly mechanical: adding support for "composite" definitions where necessary: SSA, VN, copy propagation.

While the encoding does support creating not just composite defs, but composite uses, I do not expect us to take advantage of this, due to the complexity tradeoff not being there. Fortunately, unlike with defs, "missing" a use is not a correctness problem.

No code diffs, small (< 0.1%) TP regression, with a bit more substantial memory regression (~0.2%).

What's up with the TP improvements on x86?

Apparently, DefinesLocal was particularly expensive there:

Base: 1475618025, Diff: 1472228001, -0.2297%

SsaBuilder::RenamePushDef                                        : 1132213  : NA       : 8.97%  : +0.0767%
`Compiler::optCopyPropPushDef'::`2'::<lambda_1>::operator()      : 1085980  : NA       : 8.61%  : +0.0736%
`Compiler::fgValueNumberLocalStore'::`2'::<lambda_1>::operator() : 720279   : NA       : 5.71%  : +0.0488%
Compiler::optAssertionGen                                        : 443159   : +4.30%   : 3.51%  : +0.0300%
Compiler::fgAssignSetVarDef                                      : 426327   : +89.46%  : 3.38%  : +0.0289%
Compiler::lvaLclExactSize                                        : 275118   : +111.85% : 2.18%  : +0.0186%
Compiler::fgValueNumberLocalStore                                : -308577  : -43.11%  : 2.45%  : -0.0209%
GenTree::IsPartialLclFld                                         : -348430  : -95.72%  : 2.76%  : -0.0236%
Compiler::optBlockCopyProp                                       : -471489  : -7.09%   : 3.74%  : -0.0320%
Compiler::GetSsaNumForLocalVarDef                                : -514808  : -100.00% : 4.08%  : -0.0349%
Compiler::optCopyPropPushDef                                     : -621491  : -47.61%  : 4.93%  : -0.0421%
Compiler::optIsSsaLocal                                          : -669148  : -100.00% : 5.30%  : -0.0453%
SsaBuilder::RenameDef                                            : -699833  : -34.37%  : 5.55%  : -0.0474%
GenTree::DefinesLocal                                            : -3752039 : -80.98%  : 29.73% : -0.2543%

The benefit is we get to model multi-reg assignments and get rid of the CanBeReplacedWithItsField special case. Expected improvements are in the -9K range for the benchmarks collection on ARM64, which seems worth the TP cost.

The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though.

Closes #41242.

Store them in the SSA descriptor instead of a map.

Memory (x64): +0.18%
PIN    (x64): -0.04%

The memory regression is not small...
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 24, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 24, 2022
@ghost
Copy link

ghost commented Sep 24, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

No code diffs, small (< 0.1%) TP regression, with a bit more substantial memory regression (~0.2%).

The benefit is we get to model multi-reg assignments and get rid of the CanBeReplacedWithItsField special case. Expected improvements are in the -9K range for the benchmarks collection on ARM64, which seems worth the TP cost.

The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though.

Closes #41242.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion changed the title Add IR support for modeling multple SSA definitions as one store Add IR support for modeling multiple SSA definitions as one store Sep 24, 2022
@SingleAccretion SingleAccretion force-pushed the Dependent-Fields-In-SSA-Zero-Diffs-Upstream branch 2 times, most recently from 2d036ac to 27dd411 Compare September 25, 2022 13:13
@SingleAccretion SingleAccretion marked this pull request as ready for review September 25, 2022 13:16
@SingleAccretion
Copy link
Contributor Author

@dotnet/jit-contrib, @AndyAyersMS, @jakobbotsch

For now, just the "CanBeReplacedWithItsField" case.
This enables some nice simplifications, even as the
general case gets more complex.

Two quirks have been added to attain zero diffs.
Gets us back 0.05% on the PIN counter. Hard to believe but true.
Another 0.025%.
@SingleAccretion SingleAccretion force-pushed the Dependent-Fields-In-SSA-Zero-Diffs-Upstream branch from 27dd411 to 7cd3df8 Compare September 25, 2022 13:32
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Oct 17, 2022
@JulieLeeMSFT
Copy link
Member

@AndyAyersMS please review this community PR. cc @dotnet/jit-contrib.

@AndyAyersMS
Copy link
Member

I skimmed through this and overall, it looks good. Will go back through in detail soon.

It will intersect with #77055 and #76706.

@AndyAyersMS
Copy link
Member

Fortunately, unlike with defs, "missing" a use is not a correctness problem.

I am not sure I understand this -- soon enough we will want to make sure our SSA graphs are complete.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues in the review notes can be handled as follow-ups.

Are we in position to relax some of this?

if (varDsc->lvIsStructField &&
(m_pCompiler->lvaGetParentPromotionType(lclNum) != Compiler::PROMOTION_TYPE_INDEPENDENT))
{
// SSA must exclude struct fields that are not independent
// - because we don't model the struct assignment properly when multiple fields can be assigned by one struct
// assignment.
// - SSA doesn't allow a single node to contain multiple SSA definitions.
// - and PROMOTION_TYPE_DEPENDEDNT fields are never candidates for a register.
//
return false;
}
else if (varDsc->lvIsStructField && m_pCompiler->lvaGetDesc(varDsc->lvParentLcl)->lvIsMultiRegRet)
{
return false;
}

src/coreclr/jit/gentree.h Show resolved Hide resolved
src/coreclr/jit/gentree.h Show resolved Hide resolved
GenTreeOp* m_asg;
GenTreeOp* m_asg = nullptr;
// The SSA number associated with the previous definition for partial (GTF_USEASG) defs.
unsigned m_useDefSsaNum = SsaConfig::RESERVED_SSA_NUM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could also be handled as part of the "def" ssa num, via another variant of composite/outline encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but it'd be quite complex I think, and slower for the common case in terms of instruction counts at least.

@SingleAccretion
Copy link
Contributor Author

I am not sure I understand this

Well, it's exactly that -- currently, it is not a problem for us to not know all uses of SSA variables.

With #77055, that would of course change, which is somewhat unfortunate, because morph doesn't let us know whether dependent promotion became so because of a non-decomposed use (most often a local field), or because of a non-decomposed def, so the choice would become to either enhance morph to differentiate between the two, or do the accounting work necessary to support uses.

@SingleAccretion
Copy link
Contributor Author

the accounting work necessary to support uses

To elaborate, this would involve renaming composite uses, and supporting cloning composite references (one can notice the current code doesn't handle this because it doesn't need to - SSA definitions cannot be cloned).

@SingleAccretion
Copy link
Contributor Author

Are we in position to relax some of this?

Yes, removing that dependent promotion check is the primary motivation for the change. As-is, we could remove the check after this change is merged, with some light amount of related bug fixes.

With the support for use accounting, it would require more expansive changes / thought.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Oct 19, 2022

I am not sure I understand this

Well, it's exactly that -- currently, it is not a problem for us to not know all uses of SSA variables.

With #77055, that would of course change, which is somewhat unfortunate, because morph doesn't let us know whether dependent promotion became so because of a non-decomposed use (most often a local field), or because of a non-decomposed def, so the choice would become to either enhance morph to differentiate between the two, or do the accounting work necessary to support uses.

Probably it would be enough in the short run to know that for this local/def there were no uses that got left out of SSA, and the checker could be likewise "enhanced" to not flag errors for such locals. I expect most of the benefit to accrue from simple cases where there is just a block-local lifetime.

Down the road I am interested in trying to rename some the SSA webs; I wonder if this would be a relatively cheap way to make LSRA's life easier.

@AndyAyersMS
Copy link
Member

Let's take this as is and sort through the rest over time.

@AndyAyersMS AndyAyersMS merged commit 2b61381 into dotnet:main Oct 19, 2022
@AndyAyersMS
Copy link
Member

Thanks again -- this is a very nice enhancement.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is great to see such improvement!

static const int BITS_PER_SIMPLE_NUM = 8;
static const int MAX_SIMPLE_NUM = (1 << (BITS_PER_SIMPLE_NUM - 1)) - 1;
static const int SIMPLE_NUM_MASK = MAX_SIMPLE_NUM;
static const int SIMPLE_NUM_COUNT = sizeof(int) / BITS_PER_SIMPLE_NUM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I miss something or is it always 0? What does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that is definitely a mistake, it should be (sizeof(int) * 8) / BITS_PER_SIMPLE_NUM, i. e. the maximum number of fields that can be encoded compactly.

Will be fixed in #77238.

// [byte 1]: [compact encoding bit][ssa num 1] (7 bits)
// [byte 0]: [ssa num 0] (8 bits)
// We expect this encoding to cover the 99%+ case of composite names: locals with more
// than 127 defs, maximum for this encoding, are rare, and the current limit on the count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 127 a limit for number of fields for the local or for number of all locals in the method? I expect that if we have 200 locals we would have some SsaNum > 200 and the compact encoding won't work or am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 127 a limit for number of fields for the local or for number of all locals in the method?

It is the number of SSA definitions, per a field local. So, to not fit into this limit would require code like below:

s.FieldOne = 1;
...
s.FieldOne = 2;
...
// 125 more definitions
...
s.FieldOne = 128;

(Or complex flow, because of PHIs)

// This can be in one of four states:
// 1. Single SSA name: > RESERVED_SSA_NUM (0).
// 2. RESERVED_SSA_NUM (0)
// 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: < 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would look a bit awkward because both 3 and 4 are < 0, so I left it out.

// 2. RESERVED_SSA_NUM (0)
// 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED):
// [byte 3]: [top bit][ssa num 3] (7 bits)
// [byte 2]: [ssa num 2] (8 bits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have 2 spare bits in byte2 and byte0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Since 127 definitions is already a lot, I decided being more precise was not worth it.

@SingleAccretion SingleAccretion deleted the Dependent-Fields-In-SSA-Zero-Diffs-Upstream branch October 20, 2022 16:59
@ghost ghost locked as resolved and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean-up VN for promoted fields assigned using the whole parent.
4 participants