-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IR support for modeling multiple SSA definitions as one store #76139
Add IR support for modeling multiple SSA definitions as one store #76139
Conversation
Store them in the SSA descriptor instead of a map. Memory (x64): +0.18% PIN (x64): -0.04% The memory regression is not small...
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsNo code diffs, small (< The benefit is we get to model multi-reg assignments and get rid of the The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though. Closes #41242.
|
2d036ac
to
27dd411
Compare
@dotnet/jit-contrib, @AndyAyersMS, @jakobbotsch |
For now, just the "CanBeReplacedWithItsField" case. This enables some nice simplifications, even as the general case gets more complex. Two quirks have been added to attain zero diffs.
Another 0.025%.
27dd411
to
7cd3df8
Compare
@AndyAyersMS please review this community PR. cc @dotnet/jit-contrib. |
I am not sure I understand this -- soon enough we will want to make sure our SSA graphs are complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues in the review notes can be handled as follow-ups.
Are we in position to relax some of this?
runtime/src/coreclr/jit/ssabuilder.cpp
Lines 1647 to 1661 in 6c949f5
if (varDsc->lvIsStructField && | |
(m_pCompiler->lvaGetParentPromotionType(lclNum) != Compiler::PROMOTION_TYPE_INDEPENDENT)) | |
{ | |
// SSA must exclude struct fields that are not independent | |
// - because we don't model the struct assignment properly when multiple fields can be assigned by one struct | |
// assignment. | |
// - SSA doesn't allow a single node to contain multiple SSA definitions. | |
// - and PROMOTION_TYPE_DEPENDEDNT fields are never candidates for a register. | |
// | |
return false; | |
} | |
else if (varDsc->lvIsStructField && m_pCompiler->lvaGetDesc(varDsc->lvParentLcl)->lvIsMultiRegRet) | |
{ | |
return false; | |
} |
GenTreeOp* m_asg; | ||
GenTreeOp* m_asg = nullptr; | ||
// The SSA number associated with the previous definition for partial (GTF_USEASG) defs. | ||
unsigned m_useDefSsaNum = SsaConfig::RESERVED_SSA_NUM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this could also be handled as part of the "def" ssa num, via another variant of composite/outline encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but it'd be quite complex I think, and slower for the common case in terms of instruction counts at least.
Well, it's exactly that -- currently, it is not a problem for us to not know all uses of SSA variables. With #77055, that would of course change, which is somewhat unfortunate, because morph doesn't let us know whether dependent promotion became so because of a non-decomposed use (most often a local field), or because of a non-decomposed def, so the choice would become to either enhance morph to differentiate between the two, or do the accounting work necessary to support uses. |
To elaborate, this would involve renaming composite uses, and supporting cloning composite references (one can notice the current code doesn't handle this because it doesn't need to - SSA definitions cannot be cloned). |
Yes, removing that dependent promotion check is the primary motivation for the change. As-is, we could remove the check after this change is merged, with some light amount of related bug fixes. With the support for use accounting, it would require more expansive changes / thought. |
Probably it would be enough in the short run to know that for this local/def there were no uses that got left out of SSA, and the checker could be likewise "enhanced" to not flag errors for such locals. I expect most of the benefit to accrue from simple cases where there is just a block-local lifetime. Down the road I am interested in trying to rename some the SSA webs; I wonder if this would be a relatively cheap way to make LSRA's life easier. |
Let's take this as is and sort through the rest over time. |
Thanks again -- this is a very nice enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is great to see such improvement!
static const int BITS_PER_SIMPLE_NUM = 8; | ||
static const int MAX_SIMPLE_NUM = (1 << (BITS_PER_SIMPLE_NUM - 1)) - 1; | ||
static const int SIMPLE_NUM_MASK = MAX_SIMPLE_NUM; | ||
static const int SIMPLE_NUM_COUNT = sizeof(int) / BITS_PER_SIMPLE_NUM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I miss something or is it always 0? What does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that is definitely a mistake, it should be (sizeof(int) * 8) / BITS_PER_SIMPLE_NUM
, i. e. the maximum number of fields that can be encoded compactly.
Will be fixed in #77238.
// [byte 1]: [compact encoding bit][ssa num 1] (7 bits) | ||
// [byte 0]: [ssa num 0] (8 bits) | ||
// We expect this encoding to cover the 99%+ case of composite names: locals with more | ||
// than 127 defs, maximum for this encoding, are rare, and the current limit on the count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 127 a limit for number of fields for the local or for number of all locals in the method? I expect that if we have 200 locals we would have some SsaNum > 200 and the compact encoding won't work or am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 127 a limit for number of fields for the local or for number of all locals in the method?
It is the number of SSA definitions, per a field local. So, to not fit into this limit would require code like below:
s.FieldOne = 1;
...
s.FieldOne = 2;
...
// 125 more definitions
...
s.FieldOne = 128;
(Or complex flow, because of PHIs)
// This can be in one of four states: | ||
// 1. Single SSA name: > RESERVED_SSA_NUM (0). | ||
// 2. RESERVED_SSA_NUM (0) | ||
// 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: < 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would look a bit awkward because both 3
and 4
are < 0
, so I left it out.
// 2. RESERVED_SSA_NUM (0) | ||
// 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED): | ||
// [byte 3]: [top bit][ssa num 3] (7 bits) | ||
// [byte 2]: [ssa num 2] (8 bits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have 2 spare bits in byte2 and byte0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Since 127 definitions is already a lot, I decided being more precise was not worth it.
Up until this change, it has been the case that an SSA ref had to have 1-to-1 relationship with the node it was attached to (with the exception of
CanBeReplacedWithItsField
structs). This was a problem for modeling assignments to promoted structs that are not decomposable, i. e. made by calls (multi-reg or not). The solution has been to exclude the promoted fields from SSA and VN, which is a CQ problem.This change solves this by allowing for encoding the SSA number that would represent multiple field definitions at once. The code comment describing the format is quite nice, I will copy it here:
The rest of changes are fairly mechanical: adding support for "composite" definitions where necessary: SSA, VN, copy propagation.
While the encoding does support creating not just composite defs, but composite uses, I do not expect us to take advantage of this, due to the complexity tradeoff not being there. Fortunately, unlike with defs, "missing" a use is not a correctness problem.
No code diffs, small (<
0.1%
) TP regression, with a bit more substantial memory regression (~0.2%
).What's up with the TP improvements on x86?
Apparently,
DefinesLocal
was particularly expensive there:The benefit is we get to model multi-reg assignments and get rid of the
CanBeReplacedWithItsField
special case. Expected improvements are in the-9K
range for the benchmarks collection on ARM64, which seems worth the TP cost.The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though.
Closes #41242.