Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/5.0] Fix silent bad codegen VSD possible tailcall converted to normal call #49552

Merged
merged 1 commit into from
Apr 6, 2021

Conversation

BruceForstall
Copy link
Member

@BruceForstall BruceForstall commented Mar 12, 2021

Customer Impact

In the situation described below, the JIT generates incorrect (silent bad) code. Generated code will misbehave, leading to a crash, data corruption, or similar. It is customer reported (see #49078). It's likely a very rare situation, but customers have no way of knowing if they've hit this situation. Basic code would be "return Foo(args)" for some generic interface call Foo returning a struct, plus a number of additional conditions.

Regression?

Yes.

Risk (see taxonomy)

Low.

Is there a packaging impact?

No.

Full description

The problem was when a VSD interface call returning a multi-byte struct
in registers was initially considered to be a tailcall, but the tailcall
was abandoned in morph due to not enough stack space to store outgoing
arguments, in which case we create a new call return local to store the
return struct, and re-morph the call. In doing so, we forget that we had
already added VSD non-standard args, and re-add them, leaving the originally
added arg as a "normal" arg that shouldn't be there.

So, in summary, for a call A->B, to see this failure, we need:

  1. The call is considered a potential tailcall (by the importer)
  2. The call requires non-standard arguments that add call argument IR in
    fgInitArgInfo() (e.g., VSD call -- in this case, a generic interface call)
  3. We reject the tailcall in fgMorphPotentialTailCall() (e.g., not enough
    incoming arg stack space in A to store B's outgoing args), in this case
    because the first arg is a large struct. We can't reject it earlier,
    due to things like address exposed locals -- we must get far enough
    through the checks to have called fgInitArgInfo() to add the extra
    non-standard arg.
  4. B returns a struct in multiple registers (e.g., a 16-byte struct in
    Linux x64 ABI)

The fix is to remove the previously-added non-standard VSD argument IR when
we are preparing to re-morph a call.

There was one other location that was already doing this. I'm a little
worried that there are other places where the non-standard added IR is
not being considered when we clear out the arg info before remorphing.
It seems like there is some risk here. Probably, we should consider a
better way of handling the non-standard arg IR given the need in some
cases to re-morph calls.

I commented out a few cases of:

assert(!fgCanFastTailCall(call, nullptr));

because this function can call fgInitArgInfo which can alter the IR.
That seems dangerous in an assert, which should have any side-effects
like modifying the IR.

Fixes #49078

No SPMI asm diffs.

The problem was when a VSD interface call returning a multi-byte struct
in registers was initially considered to be a tailcall, but the tailcall
was abandoned in morph due to not enough stack space to store outgoing
arguments, in which case we create a new call return local to store the
return struct, and re-morph the call. In doing so, we forget that we had
already added VSD non-standard args, and re-add them, leaving the originally
added arg as a "normal" arg that shouldn't be there.

So, in summary, for a call A->B, to see this failure, we need:
1. The call is considered a potential tailcall (by the importer)
2. The call requires non-standard arguments that add call argument IR in
   fgInitArgInfo() (e.g., VSD call -- in this case, a generic interface call)
3. We reject the tailcall in fgMorphPotentialTailCall() (e.g., not enough
   incoming arg stack space in A to store B's outgoing args), in this case
   because the first arg is a large struct. We can't reject it earlier,
   due to things like address exposed locals -- we must get far enough
   through the checks to have called fgInitArgInfo() to add the extra
   non-standard arg.
4. B returns a struct in multiple registers (e.g., a 16-byte struct in
   Linux x64 ABI)

The fix is to remove the previously-added non-standard VSD argument IR when
we are preparing to re-morph a call.

There was one other location that was already doing this. I'm a little
worried that there are other places where the non-standard added IR is
not being considered when we clear out the arg info before remorphing.
It seems like there is some risk here. Probably, we should consider a
better way of handling the non-standard arg IR given the need in some
cases to re-morph calls.

I commented out a few cases of:
```
assert(!fgCanFastTailCall(call, nullptr));
```
because this function can call `fgInitArgInfo` which can alter the IR.
That seems dangerous in an assert, which should have any side-effects
like modifying the IR.

Fixes dotnet#49078

No SPMI asm diffs.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2021
@BruceForstall
Copy link
Member Author

@sandreenko @dotnet/jit-contrib PTAL

@BruceForstall BruceForstall changed the title Fix silent bad codegen VSD possible tailcall converted to normal call [release/5.0] Fix silent bad codegen VSD possible tailcall converted to normal call Mar 13, 2021
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. @JulieLeeMSFT can you take this issue for consideration in 5.0.x?

@JulieLeeMSFT
Copy link
Member

Approved. @JulieLeeMSFT can you take this issue for consideration in 5.0.x?

Will do.

@JulieLeeMSFT JulieLeeMSFT added the Servicing-consider Issue for next servicing release review label Mar 15, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 5.0.x milestone Mar 16, 2021
@BruceForstall
Copy link
Member Author

@JulieLeeMSFT I've lost track of the servicing process. When should I expect this will get approved and merged?

@jeffschwMSFT
Copy link
Member

@BruceForstall we will take it this week for consideration. If approved it will be merged for us once the branch is open for changes.

@leecow leecow added the Servicing-approved Approved for servicing release label Mar 23, 2021
@leecow leecow modified the milestones: 5.0.x, 5.0.6 Mar 23, 2021
@JulieLeeMSFT JulieLeeMSFT removed the Servicing-consider Issue for next servicing release review label Mar 23, 2021
@Anipik
Copy link
Contributor

Anipik commented Apr 6, 2021

@BruceForstall are the outerloop failures expected ?

@BruceForstall
Copy link
Member Author

@Anipik The outerloop failures are unrelated to my change. I had triggered that pipeline to get more testing on this change. However, it appears that the pipeline is not maintained properly in the release/5.0 branch, so, unfortunately, I don't think there is currently any value there. That's a problem, IMO, but it shouldn't block this change.

@Anipik
Copy link
Contributor

Anipik commented Apr 6, 2021

known failure #48091

@Anipik Anipik merged commit 7eb058b into dotnet:release/5.0 Apr 6, 2021
@BruceForstall BruceForstall deleted the Fix49078_net5 branch April 6, 2021 19:53
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants