Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GC hole with STOREIND of LCL_VAR_ADDR/LCL_FLD_ADDR #45818

Merged
merged 3 commits into from
Dec 11, 2020

Conversation

BruceForstall
Copy link
Member

We were updating the GC liveness of the ADDR node, but
genUpdateLife() expects to get the parent node, so no
liveness was ever updated.

There were 4 SPMI GC info diffs in the libraries, all
related to uses of System.Collections.Immutable.ImmutableArray
where we struct promote fields who are themselves single-element
gc ref structs that are kept on the stack and not in registers.
In all cases, the liveness of the stack local was not reflected
in codegen's GC sets, but it was reflected in the emitter's GC
sets, so it was marked as a GC lifetime. However, that lifetime
would get cut short if we hit a call site before the last use,
as calls (sometimes) carry the full set of live variables across
the call. So, variables not in this set (including the
"accidental" emitter-created GC lifetimes here) would get killed,
leaving a hole between the intermediate call and actual stack
local last use.

Fixes #45557

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 9, 2020
@BruceForstall
Copy link
Member Author

BruceForstall commented Dec 9, 2020

See #45557 for more detailed IL + code analysis.

Additional work:

  • See if this problem exists in 5.0 (presumably yes) and 3.1
  • Try to create a reduce regression test
  • Check other platforms for similar code

@BruceForstall
Copy link
Member Author

cc @dotnet/jit-contrib

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

It seems like the same change should be made in emitInsLoadInd(), but when I make that change, I don't see any diffs, so I don't really want to touch it.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert over in UpdateLifeVar if we see addresses?

Will be interesting to determine whether this is a long-standing issue that somehow just surfaced now. From what I can tell the emitter/codegen parts haven't changed in a few years.

@BruceForstall
Copy link
Member Author

Should we assert over in UpdateLifeVar if we see addresses?

This code in TreeLifeUpdater<ForCodeGen>::UpdateLife() filters out the addresses so they never get to UpdateLifeVar:

    if (!tree->OperIsNonPhiLocal() && compiler->fgIsIndirOfAddrOfLocal(tree) == nullptr)
    {
        return;
    }

so an assert wouldn't have helped in this case; not sure it would help in any other case.

Will be interesting to determine whether this is a long-standing issue that somehow just surfaced now. From what I can tell the emitter/codegen parts haven't changed in a few years.

Yes, these lines of code have been there a long time. My guess is that we never saw STOREIND with ref types until recently (maybe some of the struct work?), as they would have been block ops or STORE_OBJ instead.

@AndyAyersMS
Copy link
Member

an assert wouldn't have helped in this case; not sure it would help in any other case.

How unfortunate. Seems like we have a fairly fragile contract here...

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed analysis.

We were updating the GC liveness of the ADDR node, but
`genUpdateLife()` expects to get the parent node, so no
liveness was ever updated.

There were 4 SPMI GC info diffs in the libraries, all
related to uses of `System.Collections.Immutable.ImmutableArray`
where we struct promote fields who are themselves single-element
gc ref structs that are kept on the stack and not in registers.
In all cases, the liveness of the stack local was not reflected
in codegen's GC sets, but it was reflected in the emitter's GC
sets, so it was marked as a GC lifetime. However, that lifetime
would get cut short if we hit a call site before the last use,
as calls (sometimes) carry the full set of live variables across
the call. So, variables not in this set (including the
"accidental" emitter-created GC lifetimes here) would get killed,
leaving a hole between the intermediate call and actual stack
local last use.

Fixes dotnet#45557
@BruceForstall BruceForstall merged commit 08322e8 into dotnet:master Dec 11, 2020
@BruceForstall BruceForstall deleted the Fix45557 branch December 11, 2020 07:44
BruceForstall added a commit to BruceForstall/runtime that referenced this pull request Dec 11, 2020
Backport of (part of) dotnet#45818 to release/5.0

Fixes: dotnet#45557

Reported by Roslyn. Bad GC info can lead to an unexplained crash that can't easily be found or worked around.

Manual, new unit test, CLR outerloop, SuperPMI asm diffs.

Low
@AndyAyersMS
Copy link
Member

Did you determine if this was a regression in 5.0?

@BruceForstall
Copy link
Member Author

Did you determine if this was a regression in 5.0?

Yes, this looks like a regression in 5.0, apparently due to keeping struct types around instead of "unwrapping" or bashing the type to the single-element struct ref type.

mmitche pushed a commit that referenced this pull request Dec 11, 2020
…45947)

Backport of (part of) #45818 to release/5.0

Fixes: #45557

Reported by Roslyn. Bad GC info can lead to an unexplained crash that can't easily be found or worked around.

Manual, new unit test, CLR outerloop, SuperPMI asm diffs.

Low
@ghost ghost locked as resolved and limited conversation to collaborators Jan 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process crash with "Internal CLR error" (0x80131506) in Roslyn CI
4 participants