Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT crash/assert while compiling Grpc.core codebase #57535

Closed
davidwrighton opened this issue Aug 17, 2021 · 16 comments · Fixed by #57602
Closed

JIT crash/assert while compiling Grpc.core codebase #57535

davidwrighton opened this issue Aug 17, 2021 · 16 comments · Fixed by #57602
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@davidwrighton
Copy link
Member

Repro instructions:
Download https://microsoft-my.sharepoint.com/:u:/p/davidwr/ETDI552L1RtCugwcniHGv5QBVsNi3PDCpXt6y-tWayhvow?e=f4wH4F and unzip it to a directory.

Using that directory as the working directory, run crossgen2 passing the rsp file in the zip as a response file. For example:

c:\git\runtime\dotnet c:\git\runtime\artifacts\bin\coreclr\windows.x64.Debug\crossgen2\crossgen2.dll @crossgen2repro.rsp

If run against a Debug crossgen2, it will produce an assert

C:\git\runtime\src\coreclr\jit\loopcloning.cpp:183
Assertion failed 'genTypeSize(genActualType(op1Tree->TypeGet())) == genTypeSize(genActualType(op2Tree->TypeGet()))' in 'Grpc.Core.Internal.SliceBufferSafeHandle:ToByteArray():System.Byte[]:this' during 'Clone loops' (IL size 215)

If run against a release crossgen2 it will produce an AV

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at Internal.JitInterface.CorInfoImpl.JitCompileMethod(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef)
--------------------------------
   at Internal.JitInterface.CorInfoImpl.CompileMethodInternal(ILCompiler.DependencyAnalysis.IMethodNode, Internal.IL.MethodIL)
   at Internal.JitInterface.CorInfoImpl.CompileMethod(ILCompiler.DependencyAnalysis.ReadyToRun.MethodWithGCInfo, ILCompiler.Logger)
   at ILCompiler.ReadyToRunCodegenCompilation.<ComputeDependencyNodeDependencies>b__36_0(ILCompiler.DependencyAnalysisFramework.DependencyNodeCore`1<ILCompiler.DependencyAnalysis.NodeFactory>)
   at System.Threading.Tasks.Parallel+<>c__DisplayClass33_0`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForEachWorker>b__0(Int32)
   at System.Threading.Tasks.Parallel+<>c__DisplayClass19_0`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForWorker>b__1(System.Threading.Tasks.RangeWorker ByRef, Int32, Boolean ByRef)
   at System.Threading.Tasks.TaskReplicator+Replica.Execute()
   at System.Threading.Tasks.TaskReplicator+Replica+<>c.<.ctor>b__4_0(System.Object)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__271_0(System.Object)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 17, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@davidwrighton davidwrighton added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 17, 2021
@ghost
Copy link

ghost commented Aug 17, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro instructions:
Download https://microsoft-my.sharepoint.com/:u:/p/davidwr/ETDI552L1RtCugwcniHGv5QBVsNi3PDCpXt6y-tWayhvow?e=f4wH4F and unzip it to a directory.

Using that directory as the working directory, run crossgen2 passing the rsp file in the zip as a response file. For example:

c:\git\runtime\dotnet c:\git\runtime\artifacts\bin\coreclr\windows.x64.Debug\crossgen2\crossgen2.dll @crossgen2repro.rsp

If run against a Debug crossgen2, it will produce an assert

C:\git\runtime\src\coreclr\jit\loopcloning.cpp:183
Assertion failed 'genTypeSize(genActualType(op1Tree->TypeGet())) == genTypeSize(genActualType(op2Tree->TypeGet()))' in 'Grpc.Core.Internal.SliceBufferSafeHandle:ToByteArray():System.Byte[]:this' during 'Clone loops' (IL size 215)

If run against a release crossgen2 it will produce an AV

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at Internal.JitInterface.CorInfoImpl.JitCompileMethod(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef)
--------------------------------
   at Internal.JitInterface.CorInfoImpl.CompileMethodInternal(ILCompiler.DependencyAnalysis.IMethodNode, Internal.IL.MethodIL)
   at Internal.JitInterface.CorInfoImpl.CompileMethod(ILCompiler.DependencyAnalysis.ReadyToRun.MethodWithGCInfo, ILCompiler.Logger)
   at ILCompiler.ReadyToRunCodegenCompilation.<ComputeDependencyNodeDependencies>b__36_0(ILCompiler.DependencyAnalysisFramework.DependencyNodeCore`1<ILCompiler.DependencyAnalysis.NodeFactory>)
   at System.Threading.Tasks.Parallel+<>c__DisplayClass33_0`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForEachWorker>b__0(Int32)
   at System.Threading.Tasks.Parallel+<>c__DisplayClass19_0`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForWorker>b__1(System.Threading.Tasks.RangeWorker ByRef, Int32, Boolean ByRef)
   at System.Threading.Tasks.TaskReplicator+Replica.Execute()
   at System.Threading.Tasks.TaskReplicator+Replica+<>c.<.ctor>b__4_0(System.Object)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__271_0(System.Object)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Author: davidwrighton
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Aug 17, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Aug 17, 2021
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 17, 2021
@AndyAyersMS
Copy link
Member

You can also repro this by grabbing Grpc.Core and Grpc.Core.Api 2.38 from nuget, extracting the DLLs, and running PMI:

corerun.exe c:\repos\jitutils\bin\pmi.dll prepall-quiet C:\bugs\r57535\Grpc.Core.dll

DllNotFoundException Grpc.Core.Internal.CommonPlatformDetection::uname (Unable to load DLL 'libc' or one of its dependencies: The specified module could not be found. (0x8007007E))

Assert failure(PID 32692 [0x00007fb4], Thread: 33432 [0x8298]): Assertion failed 'genTypeSize(genActualType(op1Tree->TypeGet())) == genTypeSize(genActualType(op2Tree->TypeGet()))' in 'Grpc.Core.Internal.SliceBufferSafeHandle:ToByteArray():System.Byte[]:this' during 'Clone loops' (IL size 215)

    File: C:\repos\runtime0\src\coreclr\jit\loopcloning.cpp Line: 183
    Image: c:\repos\runtime0\artifacts\tests\coreclr\Windows.x64.Checked\Tests\Core_Root\corerun.exe

@AndyAyersMS
Copy link
Member

Issue is that LC_Condition::ToGenTree is trying to set a TYP_LONG variable to a TYP_INT zero.

@AndyAyersMS
Copy link
Member

Suspect cloning gets confused because V01 is a long, but morph optimizes a cast to int-sized:

fgMorphTree BB34, STMT00022 (before)
               [000110] ------------              *  JTRUE     void  
               [000109] ------------              \--*  LT        int   
               [000106] ------------                 +--*  LCL_VAR   int    V10 loc9         
               [000108] ------------                 \--*  CAST      int <- long
               [000107] ------------                    \--*  LCL_VAR   long   V01 loc0         

fgMorphTree BB34, STMT00022 (after)
               [000110] -----+------              *  JTRUE     void  
               [000109] J----+-N----              \--*  LT        int   
               [000106] -----+------                 +--*  LCL_VAR   int    V10 loc9         
               [000107] C----+------                 \--*  LCL_VAR   int    V01 loc0         

and so likely cloning thinks V01 is an int?

@AndyAyersMS
Copy link
Member

Not clear yet how to fix this...

EgorBo added a commit to EgorBo/runtime-1 that referenced this issue Aug 17, 2021
@EgorBo
Copy link
Member

EgorBo commented Aug 17, 2021

I wonder if a naive fix would help EgorBo@cde5dd5 (well, it does, but how legal it is...)

@AndyAyersMS
Copy link
Member

Simple repro:

using System;
using System.Runtime.CompilerServices;

class X
{
    static long z;

    public static int Main()
    {
        z = 10;
        int[] a = F();
        long zz = z;
        int result = 0;
        for (int i = 0; i < (int) zz; i++)        result += a[i];
        return result;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int[] F()
    {
       int[] result = new int[100];
       result[3] = 100;
       return result;
    }
}

w/default behavior:

 Assertion failed 'genTypeSize(genActualType(op1Tree->TypeGet())) == genTypeSize(genActualType(op2Tree->TypeGet()))' in 'X:Main():int' during 'Clone loops' (IL size 43)

@AndyAyersMS
Copy link
Member

but how legal it is..

The other approach would be to abandon cloning in a case like this, that seems a bit safer perhaps. Testing this now.

@AndyAyersMS AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 17, 2021
@AndyAyersMS
Copy link
Member

FYI @JulieLeeMSFT -- we need to fix this for .NET 6 but it's not going to make the RC1 deadline. Should have a fix ready by tomorrow.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Aug 17, 2021

The other approach would be to abandon cloning in a case like this, that seems a bit safer perhaps. Testing this now.

Or give up on labeling loops LPFLG_VAR_LIMIT (same for the iterator flag I guess) when the "limit" has GTF_VAR_CAST on it. Such a "local" is effectively CAST(int <- LCL_VAR(long)), it seems unlikely the current machinery would handle it correctly.

(Seems to not produce SPMI diffs too which is nice)

@AndyAyersMS
Copy link
Member

Bailing out in cloning

@ -968,7 +968,14 @@ bool Compiler::optDeriveLoopCloningConditions(unsigned loopNum, LoopCloneContext
         else if (loop->lpFlags & LPFLG_VAR_INIT)
         {
             // initVar >= 0
-            LC_Condition geZero(GT_GE, LC_Expr(LC_Ident(loop->lpVarInit, LC_Ident::Var)),
+            const unsigned initLcl = loop->lpVarInit;
+            if (lvaGetDesc(initLcl)->lvType != TYP_INT)
+            {
+                JITDUMP("> Init var V%02u not TYP_INT\n", initLcl);
+                return false;
+            }
+
+            LC_Condition geZero(GT_GE, LC_Expr(LC_Ident(initLcl, LC_Ident::Var)),
                                 LC_Expr(LC_Ident(0, LC_Ident::Const)));
             context->EnsureConditions(loopNum)->Push(geZero);
         }
@@ -992,9 +999,15 @@ bool Compiler::optDeriveLoopCloningConditions(unsigned loopNum, LoopCloneContext
         }
         else if (loop->lpFlags & LPFLG_VAR_LIMIT)
         {
-            unsigned limitLcl = loop->lpVarLimit();
-            ident             = LC_Ident(limitLcl, LC_Ident::Var);
+            const unsigned limitLcl = loop->lpVarLimit();
+
+            if (lvaGetDesc(limitLcl)->lvType != TYP_INT)
+            {
+                JITDUMP("> Limit var V%02u not TYP_INT\n", limitLcl);
+                return false;
+            }

+            ident = LC_Ident(limitLcl, LC_Ident::Var);
             LC_Condition geZero(GT_GE, LC_Expr(ident), LC_Expr(LC_Ident(0, LC_Ident::Const)));

             context->EnsureConditions(loopNum)->Push(geZero);

produces 2 diffs

         -57 (-12.26% of base) : 114034.dasm - ILCompiler.Win32Resources.ResourceData:DoResourceDirectoryRead(System.Reflection.Metadata.BlobReader,int,System.Action`3[System.Object, System.UInt32, System.Boolean])
         -97 (-20.86% of base) : 75579.dasm - Microsoft.Cci.MetadataWriter:SerializeParameterInformation(Microsoft.Cci.IParameterTypeInformation,Microsoft.Cci.BlobBuilder):this

@AndyAyersMS
Copy link
Member

In the 75579 case the limit var is a ushort and the current jit produces this mixed type tree, which we evidently can tolerate:

Adding loop L00 cloning conditions to BB14
    V03 GE 0 & V03 LE V11.Length
Loop cloning condition tree before morphing:
               [000278] ---X--------              *  JTRUE     void  
               [000277] ---X--------              \--*  EQ        int   
               [000275] ---X--------                 +--*  AND       int   
               [000270] ------------                 |  +--*  GE        int   
               [000268] ------------                 |  |  +--*  LCL_VAR   ushort V03 loc0         
               [000269] ------------                 |  |  \--*  CNS_INT   int    0
               [000274] ---X--------                 |  \--*  LE        int   
               [000271] ------------                 |     +--*  LCL_VAR   ushort V03 loc0         
               [000273] ---X--------                 |     \--*  ARR_LENGTH int   
               [000272] ------------                 |        \--*  LCL_VAR   ref    V11 tmp4         
               [000276] ------------                 \--*  CNS_INT   int    0

@AndyAyersMS
Copy link
Member

Similar for 114034, a ushort:

Adding loop L01 cloning conditions to BB16
    V12 GE 0 & V12 LE V13.Length
Loop cloning condition tree before morphing:
               [000294] ---X--------              *  JTRUE     void  
               [000293] ---X--------              \--*  EQ        int   
               [000291] ---X--------                 +--*  AND       int   
               [000286] ------------                 |  +--*  GE        int   
               [000284] ------------                 |  |  +--*  LCL_VAR   ushort V12 loc9         
               [000285] ------------                 |  |  \--*  CNS_INT   int    0
               [000290] ---X--------                 |  \--*  LE        int   
               [000287] ------------                 |     +--*  LCL_VAR   ushort V12 loc9         
               [000289] ---X--------                 |     \--*  ARR_LENGTH int   
               [000288] ------------                 |        \--*  LCL_VAR   ref    V13 loc10        
               [000292] ------------                 \--*  CNS_INT   int    0

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Aug 17, 2021
The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes dotnet#57535.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 17, 2021
@JulieLeeMSFT
Copy link
Member

FYI @JulieLeeMSFT -- we need to fix this for .NET 6 but it's not going to make the RC1 deadline. Should have a fix ready by tomorrow.

We can backport to RC1 as servicing.
CC @jeffschwMSFT

@jeffschwMSFT
Copy link
Member

@JulieLeeMSFT go ahead and port a fix to release/6.0 and we can then consider for RC1

AndyAyersMS added a commit that referenced this issue Aug 18, 2021
The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes #57535.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 18, 2021
github-actions bot pushed a commit that referenced this issue Aug 18, 2021
The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes #57535.
github-actions bot pushed a commit that referenced this issue Aug 18, 2021
The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes #57535.
jeffschwMSFT pushed a commit that referenced this issue Aug 19, 2021
…cal (#57685)

* JIT: don't clone loops where init or limit is a cast local

The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes #57535.

* Apply suggestions from code review

Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>

Co-authored-by: Andy Ayers <andya@microsoft.com>
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
jeffschwMSFT pushed a commit that referenced this issue Aug 19, 2021
…t local (#57690)

* JIT: don't clone loops where init or limit is a cast local

The loop cloner assumes all computations it introduces are compatible
with TYP_INT, so don't allow cloning when the initial or final value
are variables with incompatible types.

Fixes #57535.

* Apply suggestions from code review

Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>

Co-authored-by: Andy Ayers <andya@microsoft.com>
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
@ghost ghost locked as resolved and limited conversation to collaborators Sep 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants