Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccessViolationException in MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_ benchmark #64657

Closed
janvorli opened this issue Feb 2, 2022 · 9 comments · Fixed by #64683
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug os-mac-os-x macOS aka OSX
Milestone

Comments

@janvorli
Copy link
Member

janvorli commented Feb 2, 2022

When running the dotnet/performance benchmarks with 7.0 preview 1 bits on my mac M1 ARM64 device, I've found that the MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_ test is consistently crashing with AccessViolationException. I've debugged it and it seems to be a JIT bug. JIT generates an instruction with wrong scaling mode like this:

ldr     wzr, [x3, w1, uxtw #2]

The x3 points to the first character of a string and w1 is an index into it. This instruction scales the index by 4 instead of just 2, getting out of the string and hitting an unmapped memory, which results in the reported AccessViolationException.

I have captured the disassembly of the method with embedded IL here: https://gist.github.com/janvorli/643f0e3b3acacb083af3178b5a232bc2

The crash happens at 00000002842f832c. The x3 is 0x113F64044, the w1 is 69626. The memory address accessed during the crash is 0x113fa802c. It is equal to 0x113F64044+4*69626.

The whole string is this one:

(lldb) dumpobj 0x0000000113f64038
Name:        System.String
MethodTable: 0000000280ede248
EEClass:     0000000280ec9270
Size:        143692(0x2314c) bytes
File:        /Users/janvorli/git/performance/tools/dotnet/arm64/shared/Microsoft.NETCore.App/7.0.0-preview.1.22076.8/System.Private.CoreLib.dll
String:
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000000028011dfc8  40002e8        8         System.Int32  1 instance            71835 _stringLength
0000000280113558  40002e9        c          System.Char  1 instance               7b _firstChar
0000000280ede248  40002e7       d0        System.String  0   static 0000000103f643a0 Empty

I believe this is the source code at the failure point (the Read method seems to be inlined):
https://github.com/kevin-montrose/Jil/blob/9276eaa6a0aaea41d280ff01abeed56214f782df/Jil/Deserialize/Methods.ThunkReader.cs#L4453

Here is the stack trace of the exception:

[2022/02/02 01:56:10][INFO] Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
[2022/02/02 01:56:10][INFO]    at Jil.Deserialize.Methods._ReadISO8601DateWithOffsetThunkReader(Jil.Deserialize.ThunkReader ByRef, Char[])
[2022/02/02 01:56:10][INFO]    at Jil.Deserialize.Methods._ReadISO8601DateWithOffsetWithCharArrayThunkReader(Jil.Deserialize.ThunkReader ByRef, Char[] ByRef)
[2022/02/02 01:56:10][INFO]    at DynamicClass._DynamicMethod3(Jil.Deserialize.ThunkReader ByRef, Int32)
[2022/02/02 01:56:10][INFO]    at DynamicClass._DynamicMethod9(Jil.Deserialize.ThunkReader ByRef, Int32)
[2022/02/02 01:56:10][INFO]    at Jil.JSON.Deserialize[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Buffers.ReadOnlySequence`1<Char>, Jil.Options)
[2022/02/02 01:56:10][INFO]    at Jil.JSON.Deserialize[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.String, Jil.Options)
[2022/02/02 01:56:10][INFO]    at MicroBenchmarks.Serializers.Json_FromString`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Jil_()
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.Runnable_0.WorkloadActionNoUnroll(Int64)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.Engine.RunIteration(BenchmarkDotNet.Engines.IterationData)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.EngineFactory.Jit(BenchmarkDotNet.Engines.Engine, Int32, Int32, Int32)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(BenchmarkDotNet.Engines.EngineParameters)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.Runnable_0.Run(BenchmarkDotNet.Engines.IHost, System.String)
[2022/02/02 01:56:10][INFO]    at System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Span`1<System.Object> ByRef, System.Signature, Boolean, Boolean)
[2022/02/02 01:56:10][INFO]    at System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(System.String[])
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.UniqueProgramName.Main(System.String[])
[2022/02/02 01:56:19][INFO] // Benchmark Process 97044 has exited with code 134.
[2022/02/02 01:56:19][INFO] Unhandled exception. System.InvalidOperationException: Sequence contains no matching element
[2022/02/02 01:56:19][INFO]    at System.Linq.ThrowHelper.ThrowNoMatchException()
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Execute(ILogger logger, BenchmarkCase benchmarkCase, BenchmarkId benchmarkId, IToolchain toolchain, BuildResult buildResult, IResolver resolver)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.RunCore(BenchmarkCase benchmarkCase, BenchmarkId benchmarkId, ILogger logger, IResolver resolver, BuildResult buildResult)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Run(BenchmarkRunInfo benchmarkRunInfo, Dictionary`2 buildResults, IResolver resolver, ILogger logger, List`1 artifactsToCleanup, String resultsFolderPath, String logFilePath, StartedClock& runChronometer)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Run(BenchmarkRunInfo[] benchmarkRunInfos)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkSwitcher.RunWithDirtyAssemblyResolveHelper(String[] args, IConfig config, Boolean askUserForInput)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkSwitcher.Run(String[] args, IConfig config)
[2022/02/02 01:56:19][INFO]    at MicroBenchmarks.Program.Main(String[] args) in /Users/janvorli/git/performance/src/benchmarks/micro/Program.cs:line 42

Unfortunately, the runtime was a release build, so I couldn't get a JIT dump. However, the benchmark scripts allow to use a self-built runtime, so it should not be difficult to get it repro even with a checked build of the runtime.

Here are the repro steps:

  1. Unless you already have it, install python3 and add it to your $PATH
  2. git clone https://github.com/dotnet/performance.git
  3. python3 ./performance/scripts/benchmarks_ci.py --architecture arm64 -f net7.0 --dotnet-versions 7.0.100-preview.1.22077.12  --filter 'MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_'
    

The test run takes quite some time, but it always crashes at that place for me.

@janvorli janvorli added bug arch-arm64 os-mac-os-x macOS aka OSX area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 2, 2022
@ghost
Copy link

ghost commented Feb 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When running the dotnet/performance benchmarks with 7.0 preview 1 bits on my mac M1 ARM64 device, I've found that the MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_ test is consistently crashing with AccessViolationException. I've debugged it and it seems to be a JIT bug. JIT generates an instruction with wrong scaling mode like this:

ldr     wzr, [x3, w1, uxtw #2]

The x3 points to the first character of a string and w1 is an index into it. This instruction scales the index by 4 instead of just 2, getting out of the string and hitting an unmapped memory, which results in the reported AccessViolationException.

I have captured the disassembly of the method with embedded IL here: https://gist.github.com/janvorli/643f0e3b3acacb083af3178b5a232bc2

The crash happens at 00000002842f832c. The x3 is 0x113F64044, the w1 is 69626. The memory address accessed during the crash is 0x113fa802c. It is equal to 0x113F64044+4*69626.

The whole string is this one:

(lldb) dumpobj 0x0000000113f64038
Name:        System.String
MethodTable: 0000000280ede248
EEClass:     0000000280ec9270
Size:        143692(0x2314c) bytes
File:        /Users/janvorli/git/performance/tools/dotnet/arm64/shared/Microsoft.NETCore.App/7.0.0-preview.1.22076.8/System.Private.CoreLib.dll
String:
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000000028011dfc8  40002e8        8         System.Int32  1 instance            71835 _stringLength
0000000280113558  40002e9        c          System.Char  1 instance               7b _firstChar
0000000280ede248  40002e7       d0        System.String  0   static 0000000103f643a0 Empty

I believe this is the source code at the failure point (the Read method seems to be inlined):
https://github.com/kevin-montrose/Jil/blob/9276eaa6a0aaea41d280ff01abeed56214f782df/Jil/Deserialize/Methods.ThunkReader.cs#L4453

Here is the stack trace of the exception:

[2022/02/02 01:56:10][INFO] Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
[2022/02/02 01:56:10][INFO]    at Jil.Deserialize.Methods._ReadISO8601DateWithOffsetThunkReader(Jil.Deserialize.ThunkReader ByRef, Char[])
[2022/02/02 01:56:10][INFO]    at Jil.Deserialize.Methods._ReadISO8601DateWithOffsetWithCharArrayThunkReader(Jil.Deserialize.ThunkReader ByRef, Char[] ByRef)
[2022/02/02 01:56:10][INFO]    at DynamicClass._DynamicMethod3(Jil.Deserialize.ThunkReader ByRef, Int32)
[2022/02/02 01:56:10][INFO]    at DynamicClass._DynamicMethod9(Jil.Deserialize.ThunkReader ByRef, Int32)
[2022/02/02 01:56:10][INFO]    at Jil.JSON.Deserialize[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Buffers.ReadOnlySequence`1<Char>, Jil.Options)
[2022/02/02 01:56:10][INFO]    at Jil.JSON.Deserialize[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.String, Jil.Options)
[2022/02/02 01:56:10][INFO]    at MicroBenchmarks.Serializers.Json_FromString`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Jil_()
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.Runnable_0.WorkloadActionNoUnroll(Int64)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.Engine.RunIteration(BenchmarkDotNet.Engines.IterationData)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.EngineFactory.Jit(BenchmarkDotNet.Engines.Engine, Int32, Int32, Int32)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(BenchmarkDotNet.Engines.EngineParameters)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.Runnable_0.Run(BenchmarkDotNet.Engines.IHost, System.String)
[2022/02/02 01:56:10][INFO]    at System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Span`1<System.Object> ByRef, System.Signature, Boolean, Boolean)
[2022/02/02 01:56:10][INFO]    at System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(System.String[])
[2022/02/02 01:56:10][INFO]    at BenchmarkDotNet.Autogenerated.UniqueProgramName.Main(System.String[])
[2022/02/02 01:56:19][INFO] // Benchmark Process 97044 has exited with code 134.
[2022/02/02 01:56:19][INFO] Unhandled exception. System.InvalidOperationException: Sequence contains no matching element
[2022/02/02 01:56:19][INFO]    at System.Linq.ThrowHelper.ThrowNoMatchException()
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Execute(ILogger logger, BenchmarkCase benchmarkCase, BenchmarkId benchmarkId, IToolchain toolchain, BuildResult buildResult, IResolver resolver)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.RunCore(BenchmarkCase benchmarkCase, BenchmarkId benchmarkId, ILogger logger, IResolver resolver, BuildResult buildResult)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Run(BenchmarkRunInfo benchmarkRunInfo, Dictionary`2 buildResults, IResolver resolver, ILogger logger, List`1 artifactsToCleanup, String resultsFolderPath, String logFilePath, StartedClock& runChronometer)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkRunnerClean.Run(BenchmarkRunInfo[] benchmarkRunInfos)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkSwitcher.RunWithDirtyAssemblyResolveHelper(String[] args, IConfig config, Boolean askUserForInput)
[2022/02/02 01:56:19][INFO]    at BenchmarkDotNet.Running.BenchmarkSwitcher.Run(String[] args, IConfig config)
[2022/02/02 01:56:19][INFO]    at MicroBenchmarks.Program.Main(String[] args) in /Users/janvorli/git/performance/src/benchmarks/micro/Program.cs:line 42

Unfortunately, the runtime was a release build, so I couldn't get a JIT dump. However, the benchmark scripts allow to use a self-built runtime, so it should not be difficult to get it repro even with a checked build of the runtime.

Here are the repro steps:

  1. Unless you already have it, install python3 and add it to your $PATH
  2. git clone https://github.com/dotnet/performance.git
  3. python3 ./performance/scripts/benchmarks_ci.py --architecture arm64 -f net7.0 --dotnet-versions 7.0.100-preview.1.22077.12  --filter 'MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_'
    

The test run takes quite some time, but it always crashes at that place for me.

Author: janvorli
Assignees: -
Labels:

bug, arch-arm64, os-mac-os-x, area-CodeGen-coreclr

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 2, 2022
@janvorli
Copy link
Member Author

janvorli commented Feb 2, 2022

cc @dotnet/jit-contrib

@janvorli
Copy link
Member Author

janvorli commented Feb 2, 2022

cc: @adamsitnik - this is the issue I have mentioned to you.

@EgorBo
Copy link
Member

EgorBo commented Feb 2, 2022

I'll take a look, looks like an incorrect scaling mode for implicit nullchecks

@EgorBo EgorBo self-assigned this Feb 2, 2022
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 2, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Feb 2, 2022
@EgorBo
Copy link
Member

EgorBo commented Feb 2, 2022

@janvorli thanks for the detailed analysis!! it seems it's a general bug in JIT and not arm64 specific. Basically the same as #59315 (comment) (thanks to @jakobbotsch for pointing to this)

Repro:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

public unsafe class Prog
{

    [DllImport("kernel32")]
    public static extern byte* VirtualAlloc(IntPtr lpAddress, nuint dwSize, uint flAllocationType, uint flProtect);


    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Validate(char* c, int x)
    {
        // on x64 it does 
        //
        //   4863C2 movsxd   rax, edx
        //   8B0441 mov      eax, dword ptr[rcx + 2 * rax]
        //
        // instead of loading just word
        
        char implicit_nullcheck = c[x];
    }

    static void Main()
    {
        byte* page = VirtualAlloc(IntPtr.Zero, 4096, 0x1000 | 0x2000, 0x04);

        Validate((char*)(page + 4096 - 4), 0); // OK
        Console.WriteLine("1");

        Validate((char*)(page + 4096 - 2), 0); // AVE
        Console.WriteLine("2");
    }
}

@EgorBo
Copy link
Member

EgorBo commented Feb 2, 2022

I wonder if we can somehow change all ReadOnlyChar<byte> xx => new byte[] {} to use virtualloc in debug build 🤔

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 2, 2022
@danmoseley
Copy link
Member

Just curious, in what circumstances does the JIT do run-time null checks? I had assumed that it allows execution to hit an AV in the normal way, which the runtime then maps to NullReferenceException.

@EgorBo
Copy link
Member

EgorBo commented Feb 9, 2022

Just curious, in what circumstances does the JIT do run-time null checks? I had assumed that it allows execution to hit an AV in the normal way, which the runtime then maps to NullReferenceException.

Right, for such explicit nullchecks VM catches attempts to dereference 0. In this specific case it doesn't do it, in fact the object we run nullcheck for is not even null. But when we check it for null we slightly cross page boundary and read more than needed

static void Validate(ref char c, int x)
{
    char implicit_nullcheck = c[x];
}

image
we only need to load this green area as a nullcheck (load and throw away) two bytes (char). But nullcheck always loads 4 bytes so we can cross the page boundary in some rare cases

@janvorli
Copy link
Member Author

janvorli commented Feb 10, 2022 via email

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 10, 2022
adamsitnik added a commit to adamsitnik/performance that referenced this issue Mar 1, 2022
adamsitnik added a commit to dotnet/performance that referenced this issue Mar 1, 2022
* define NET7_0_PREVIEW2_OR_GREATER const for .NET 7 Preview2+ builds

* don't use Regex.Count API for older versions of .NET 7 (used as baseline for monthly perf runs)

* don't try to run AddRemoveFromDifferentThreads.ConcurrentStack benchmark on ARM64 as it hangs

details: dotnet/runtime#64980

* don't try to run Json_FromString.Jil_ benchmark on ARM64 as it throws AccessViolationException

details: dotnet/runtime#64657
@ghost ghost locked as resolved and limited conversation to collaborators Mar 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug os-mac-os-x macOS aka OSX
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants