Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fold mismatched OBJ(ADDR(LCL_VAR)) #69271

Merged

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented May 12, 2022

The transforms in question could fold, e. g., OBJ<8>(ADDR(LCL_VAR<16>)) to just the LCL_VAR, necessitating workarounds in block morphing that had to rematerialize the block node to restore IR's validity.

It is better to simply not create questionable IR in the first place.

We're expecting a small number of diffs due to using OBJs instead of INDs in more cases.

Fixes #69232.

What was the problem in #69232?

It was a problem introduced by #68712 and exposed with #68739.

Minimal reproduction:

[StructLayout(LayoutKind.Sequential, Size = 16)]
struct FakeVtor
{

}

[MethodImpl(MethodImplOptions.NoInlining)]
private static FakeVtor Problem()
{
    FakeVtor GetFakeVtor(Vector128<int> v) => *(FakeVtor*)&v;

    Vector128<int> b = default;
    FakeVtor a = GetFakeVtor(b);
    Use(&a);

    return a;
}

Importation, through the magic of skipping ADDRs in gtNewCpObjNode and gtNewBlkOpNode created the following type-mismatched tree:

               [000025] -A---------                         *  ASG       struct (copy)
               [000009] D----+-N---                         +--*  LCL_VAR   struct<RyuJitReproduction.Program+FakeVtor, 16> V00 loc0         
               [000004] -----+-----                         \--*  LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[System.Int32]> V01 loc1         

This on its own should be "fine" since block morphing is supposed to compensate for this mismatch, by wrapping the local in an OBJ(ADDR) here:

if (needsIndirection)
{
if ((indirTree != nullptr) && (indirTree->OperIsBlk() || !isBlkReqd))
{
effectiveVal->gtType = asgType;
}
else
{
GenTree* newTree;
GenTree* addr = gtNewOperNode(GT_ADDR, TYP_BYREF, effectiveVal);
if (isBlkReqd)
{
CORINFO_CLASS_HANDLE clsHnd = gtGetStructHandleIfPresent(effectiveVal);
if (clsHnd == NO_CLASS_HANDLE)
{
newTree = new (this, GT_BLK) GenTreeBlk(GT_BLK, TYP_STRUCT, addr, typGetBlkLayout(blockWidth));
}
else
{
newTree = gtNewObjNode(clsHnd, addr);
gtSetObjGcInfo(newTree->AsObj());
}
gtUpdateNodeSideEffects(newTree);
}
else
{
newTree = gtNewIndir(asgType, addr);
}
effectiveVal = newTree;
}
}

However, it fails, because it creates an OBJ node of TYP_SIMD16 (using V01's handle). Before #68712, it would have created a handle-less IND node and not run into this problem.

I decided to fix the source of the problem instead of extending the workaround in morph because: a) it was simpler, b) I wanted to delete this questionable folding for quite some time as it was causing similar problems elsewhere.

The transforms in question could fold, e. g. "OBJ<8>(ADDR(LCL_VAR<16>))"
to just the "LCL_VAR", necessitating workarounds in block morphing that
had to rematerialize the block node to restore IR's validity.

It is better to simply not create questionable IR in the first place.
@jakobbotsch
Copy link
Member

@SingleAccretion is this ready?

@SingleAccretion
Copy link
Contributor Author

Unfortunately not yet, the updated SPMI collections have uncovered a couple regressions that have to be solved.

@SingleAccretion SingleAccretion force-pushed the Fixing-Vtor-Mismatch-Upstream branch from 92fcb08 to d6d9966 Compare May 13, 2022 22:01
By using layout compatibility in "gtNewStructVal" instead of handle equality.

This is safe to do because "gtNewStructVal" is only used in contexts of block
copies (as opposed to call arguments).
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented May 14, 2022

The diffs look more reasonable with the fix: SPMI in CI.

There are still regressions, caused by the fix itself, where we forward-substitute a multi-reg call and the new local defined by it is marked as multi-reg, excluding it from SSA-based optimizations. I think the low number of said regressions (and the fact there are some improvements too) makes the change overall acceptable.

@dotnet/jit-contrib

@SingleAccretion SingleAccretion marked this pull request as ready for review May 14, 2022 13:46
Comment on lines +38 to +39
// The normalized type to use in IR for block nodes with this layout.
const var_types m_type;
Copy link
Contributor Author

@SingleAccretion SingleAccretion May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases the size of ClassLayout by one pointer. I measured the overall impact on memory consumption when CG-ing x64 CoreLib, it looked insignificant:

./diff-mem x64
Relative difference is: 0.007%

(This is a memory-CPU tradeoff: impNormStructType is not a cheap function)

@jakobbotsch jakobbotsch merged commit 6134ee9 into dotnet:main May 14, 2022
@SingleAccretion SingleAccretion deleted the Fixing-Vtor-Mismatch-Upstream branch May 14, 2022 19:46
@ghost ghost locked as resolved and limited conversation to collaborators Jun 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure JIT\\HardwareIntrinsics\\General\\NotSupported\\NotSupported_r\\NotSupported_r.cmd
2 participants