Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Fix rel32 for calls/data #49549

Merged
merged 3 commits into from
Mar 15, 2021

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 12, 2021

JIT supports relative addresses for calls/data (see jump-stubs.md), it allows it to produce smaller codegen in some cases. Consider the following example:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

class Program
{
    [DllImport("kernel32.dll"), SuppressGCTransition]
    static extern int GetLogicalDrives();

    static void Main()
    {
        // var _ = GetLogicalDrives();
        Console.WriteLine(GetField());
    }

    static int s_Field;

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int GetField() => s_Field;
}

(ignore that pinvoke for now)

Let's take a look at the GetField codegen:

; Assembly listing for method ConsoleApp18.Program:GetField():int
       8B0586201000         mov      eax, dword ptr [reloc classVar[0x5c2801b8]]
       C3                   ret      
; Total bytes of code 7,

Cool, it uses (reloc)! Now, let's uncomment that "unrelated" GetLogicalDrives call, which should not affect GetField codegen anyhow and run the app again (and ask for GetField codegen):

       48B8C4671452F97F0000 mov      rax, 0x7FF9521467C4
       8B00                 mov      eax, dword ptr [rax]
       C3                   ret      
; Total bytes of code 13

Boom! rel32 is disabled globally for the whole app and won't be emitted anywhere (so e.g. all static fields, all PGO counters won't benefit from it).

Why it happens

For pinvokes + SuppressGCTransition we emit direct calls to those pinvokes (since we don't have to switch coop to preemptive and back) with the absolute address of the external target (in our case - kernel32's GetLogicalDrives) here.
Then here emitter tries to use rel32 for this call and asks VM to validate and record the relocation here. So we hit this code-path and basically ask VM to recompile the whole method and disable fAllowRel32 globally for everybody (see here). My fix just sets gtControExpr for such calls so the emitter will just emit:

mov reg, (pc/zero-relative 64bit address of kernel32::GetLogicalDrives)
call reg

instead of trying to use rel32 (jump-stubs) and fail.

Jit-diff (jit-utils --pmi -f)

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53698233
Total bytes of diff: 53037307
Total bytes of delta: -660926 (-1.23% of base)
    diff is an improvement.


Top file improvements (bytes):
      -55386 : System.Linq.dasm (-5.08% of base)
      -52783 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-1.62% of base)
      -41959 : System.Private.CoreLib.dasm (-0.90% of base)
      -38216 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.67% of base)
      -35046 : System.Private.Xml.dasm (-0.99% of base)
      -33859 : Microsoft.CodeAnalysis.dasm (-1.84% of base)
      -33306 : Microsoft.CodeAnalysis.CSharp.dasm (-0.76% of base)
      -27097 : FSharp.Core.dasm (-0.73% of base)
      -17311 : Newtonsoft.Json.dasm (-1.96% of base)
      -17155 : System.Collections.Immutable.dasm (-1.31% of base)
      -13727 : System.Threading.Tasks.Dataflow.dasm (-1.42% of base)
      -13029 : System.Data.Common.dasm (-0.85% of base)
      -12081 : System.Linq.Parallel.dasm (-0.59% of base)
      -11299 : System.Linq.Expressions.dasm (-1.43% of base)
      -10602 : System.Linq.Queryable.dasm (-3.22% of base)
       -9854 : System.Collections.dasm (-1.69% of base)
       -9848 : System.CodeDom.dasm (-5.28% of base)
       -9578 : System.ComponentModel.Composition.dasm (-2.76% of base)
       -9538 : System.Configuration.ConfigurationManager.dasm (-2.75% of base)
       -8478 : System.Speech.dasm (-1.97% of base)

190 total files with Code Size differences (190 improved, 0 regressed), 81 unchanged.

Top method regressions (bytes):
         188 ( 0.86% of base) : System.Management.dasm - ManagementClassGenerator:GenerateMethods():this
          11 ( 0.90% of base) : Microsoft.CodeAnalysis.dasm - Parser:GetMatchingMethods(String,byref,List`1,String,int,Compilation,List`1)
           4 ( 0.28% of base) : System.Console.dasm - ConsolePal:SetWindowSize(int,int)
           4 ( 1.42% of base) : Microsoft.CodeAnalysis.dasm - PdbMetadataWrapper:Microsoft.Cci.IMetaDataImport.GetMethodProps(int,byref,long,int,byref,long,long,long,long):int:this
           2 ( 1.07% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - QPCTime:.ctor():this
           2 ( 0.51% of base) : System.Console.dasm - ConsolePal:GetStandardFile(int,int,bool):Stream (2 methods)
           2 ( 0.38% of base) : System.Console.dasm - ConsolePal:get_KeyAvailable():bool
           2 ( 0.55% of base) : System.Console.dasm - ConsolePal:set_CursorSize(int)
           2 ( 0.88% of base) : System.Console.dasm - ConsolePal:set_CursorVisible(bool)
           1 ( 1.96% of base) : Microsoft.Extensions.Http.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 0.08% of base) : runincontext.dasm - TestRunner:ExecuteAndUnload(List`1,byref,byref):int:this
           1 ( 1.61% of base) : System.Drawing.Primitives.dasm - KnownColorTable:GetSystemColorArgb(int):int
           1 ( 1.96% of base) : System.Net.Http.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 0.21% of base) : System.Private.CoreLib.dasm - DateTime:FromFileTimeLeapSecondsAware(long):DateTime
           1 ( 0.27% of base) : System.Private.CoreLib.dasm - DateTime:ToFileTimeLeapSecondsAware(long):long
           1 ( 0.47% of base) : System.Private.CoreLib.dasm - ComWrappers:RegisterForTrackerSupport(ComWrappers)
           1 ( 0.47% of base) : System.Private.CoreLib.dasm - ComWrappers:RegisterForMarshalling(ComWrappers)
           1 ( 1.33% of base) : System.Private.CoreLib.dasm - Stopwatch:Start():this
           1 ( 0.81% of base) : System.Private.CoreLib.dasm - Stopwatch:StartNew():Stopwatch
           1 ( 1.04% of base) : System.Private.CoreLib.dasm - Stopwatch:Stop():this

Top method improvements (bytes):
       -2984 (-8.94% of base) : System.Linq.dasm - Enumerable:Max(IEnumerable`1,Func`2):Nullable`1 (48 methods)
       -2508 (-8.49% of base) : System.Linq.dasm - Enumerable:Min(IEnumerable`1,Func`2):Nullable`1 (48 methods)
       -2364 (-9.06% of base) : System.Linq.dasm - Lookup`2:Create(IEnumerable`1,Func`2,Func`2,IEqualityComparer`1):Lookup`2 (64 methods)
       -2070 (-8.17% of base) : System.Linq.dasm - Enumerable:Average(IEnumerable`1,Func`2):Nullable`1 (40 methods)
       -1872 (-2.84% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTraceEventSource:InitEventMap():Dictionary`2
       -1284 (-8.54% of base) : System.Linq.dasm - Enumerable:Sum(IEnumerable`1,Func`2):Nullable`1 (40 methods)
       -1148 (-15.15% of base) : System.Private.Xml.dasm - XsltLoader:.ctor():this
       -1105 (-10.12% of base) : System.Linq.dasm - Enumerable:Average(IEnumerable`1,Func`2):double (24 methods)
       -1072 (-9.44% of base) : System.Threading.Tasks.Dataflow.dasm - TransformManyBlock`2:StoreOutputItemsNonReorderedWithIteration(IEnumerable`1):this (8 methods)
       -1053 (-7.49% of base) : System.Linq.dasm - <TakeRangeFromEndIterator>d__222`1:MoveNext():bool:this (8 methods)
        -896 (-11.45% of base) : System.ComponentModel.Composition.dasm - ExportProvider:GetExportsCore(String):IEnumerable`1:this (16 methods)
        -878 (-8.23% of base) : System.Linq.dasm - Enumerable:SequenceEqual(IEnumerable`1,IEnumerable`1,IEqualityComparer`1):bool (8 methods)
        -759 (-8.14% of base) : ILCompiler.Reflection.ReadyToRun.dasm - PgoProcessor:EncodePgoData(IEnumerable`1,IPgoEncodedValueEmitter`1,bool) (8 methods)
        -756 (-14.59% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(IEnumerable`1):this (21 methods)
        -678 (-8.22% of base) : System.ComponentModel.Composition.dasm - CompositionContainer:ReleaseExports(IEnumerable`1):this (17 methods)
        -672 (-6.33% of base) : CommandLine.dasm - <RenderUsageTextAsLines>d__56`1:MoveNext():bool:this (8 methods)
        -667 (-5.61% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - RegisteredTraceEventParser:GetManifestForRegisteredProvider(Guid):String
        -660 (-11.49% of base) : Microsoft.CodeAnalysis.dasm - DeltaMetadataWriter:AddRange(IReadOnlyDictionary`2,IReadOnlyDictionary`2,bool):IReadOnlyDictionary`2 (8 methods)
        -646 (-1.79% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicCommandLineParser:Parse(IEnumerable`1,String,String,String):VisualBasicCommandLineArguments:this
        -636 (-6.38% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Trim():bool:this (10 methods)

Top method regressions (percentages):
           1 ( 2.22% of base) : System.Private.CoreLib.dasm - Console:.cctor()
           1 ( 2.17% of base) : System.Private.CoreLib.dasm - Stopwatch:GetTimestamp():long
           1 ( 2.08% of base) : System.Console.dasm - ConsolePal:IsInputRedirectedCore():bool
           1 ( 2.08% of base) : System.Console.dasm - ConsolePal:IsOutputRedirectedCore():bool
           1 ( 2.08% of base) : System.Console.dasm - ConsolePal:IsErrorRedirectedCore():bool
           1 ( 2.04% of base) : System.Console.dasm - ConsolePal:get_InputHandle():long
           1 ( 2.04% of base) : System.Console.dasm - ConsolePal:get_OutputHandle():long
           1 ( 2.04% of base) : System.Console.dasm - ConsolePal:get_ErrorHandle():long
           1 ( 1.96% of base) : Microsoft.Extensions.Http.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 1.96% of base) : System.Net.Http.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 1.96% of base) : System.Private.CoreLib.dasm - Stopwatch:QueryPerformanceCounter():long
           1 ( 1.96% of base) : System.Net.NameResolution.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 1.96% of base) : System.Net.Security.dasm - ValueStopwatch:StartNew():ValueStopwatch
           1 ( 1.64% of base) : System.Console.dasm - Console:get_LargestWindowWidth():int
           1 ( 1.64% of base) : System.Console.dasm - Console:get_LargestWindowHeight():int
           1 ( 1.61% of base) : System.Drawing.Primitives.dasm - KnownColorTable:GetSystemColorArgb(int):int
           1 ( 1.52% of base) : System.Console.dasm - ConsolePal:get_LargestWindowWidth():int
           1 ( 1.52% of base) : System.Console.dasm - ConsolePal:get_LargestWindowHeight():int
           4 ( 1.42% of base) : Microsoft.CodeAnalysis.dasm - PdbMetadataWrapper:Microsoft.Cci.IMetaDataImport.GetMethodProps(int,byref,long,int,byref,long,long,long,long):int:this
           1 ( 1.41% of base) : xunit.performance.execution.dasm - BenchmarkEventSource:GetTimestamp():double:this

Top method improvements (percentages):
        -516 (-50.10% of base) : System.Private.CoreLib.dasm - DateTime:get_UtcNow():DateTime (2 base, 1 diff methods)
          -6 (-42.86% of base) : System.Transactions.Local.dasm - EnterpriseServices:get_CreatedServiceDomain():bool
          -6 (-42.86% of base) : System.Security.Permissions.dasm - SecurityManager:get_CheckExecutionRights():bool
          -6 (-42.86% of base) : System.Security.Permissions.dasm - SecurityManager:get_SecurityEnabled():bool
         -24 (-41.38% of base) : Microsoft.CodeAnalysis.dasm - ObjectReaderWriterBase:.cctor()
         -30 (-40.54% of base) : System.DirectoryServices.AccountManagement.dasm - AUTHZ_RM_FLAG:.cctor()
         -30 (-40.54% of base) : System.Net.Sockets.dasm - UnixDomainSocketEndPoint:.cctor()
         -78 (-38.61% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventNativeMethods:.cctor()
          -6 (-37.50% of base) : Microsoft.Extensions.FileSystemGlobbing.dasm - <EnumerateFileSystemInfos>d__12:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this
          -6 (-37.50% of base) : Microsoft.Extensions.FileSystemGlobbing.dasm - <EnumerateFileSystemInfos>d__4:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this
          -6 (-37.50% of base) : Microsoft.Extensions.Options.dasm - <FindConfigurationServices>d__8:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this
         -48 (-37.50% of base) : System.Collections.dasm - <Reverse>d__84:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (8 methods)
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:System.Collections.Generic.IDictionary<TKey,TValue>.get_Item(__Canon):Nullable`1:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:System.Collections.Generic.IDictionary<TKey,TValue>.get_Item(int):Nullable`1:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:System.Collections.Generic.IDictionary<TKey,TValue>.get_Item(Vector`1):Nullable`1:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:System.Collections.Generic.IDictionary<TKey,TValue>.get_Item(long):Nullable`1:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Immutable.IOrderedCollection<T>.get_Item(int):__Canon:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Immutable.IOrderedCollection<T>.get_Item(int):ubyte:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Immutable.IOrderedCollection<T>.get_Item(int):short:this
          -6 (-37.50% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Immutable.IOrderedCollection<T>.get_Item(int):int:this

34476 total methods with Code Size differences (34416 improved, 60 regressed), 233720 unchanged.

--------------------------------------------------------------------------------
Completed analysis in 38.89s

Quite a huge diff, but from my understanding, it happens because of recompilation, e.g. here is the diff result for DateTime.UtcNow:
base: https://gist.github.com/EgorBo/106aede86c783f48e5966a0c81f11884
diff: https://gist.github.com/EgorBo/e37bc359c6c654a0bea27858e1724576

As you can see, in the base it's presented twice (the first one got rejected by VM).
Here is the diff between the second version of base and the diff: https://www.diffchecker.com/UgPKqFpn
as you can see gc-polls use reloc only in the "diff" (this PR) version. So, the actual codegen diff is -16 bytes rather than -530 as of reported.

In the real-word such recompilations happen only once (once we hit any pinvoke+suppressgc anywhere, BCL has plenty of them, e.g. in the IO subsystem - I guess we always hit one pretty early) and then fAllowRel32 is disabled for future compilation. But jit-diff compiles each method separately so the diff is full of failed duplicates.

/cc @dotnet/jit-contrib @jkotas

PS: we don't support rel32 for Linux/macOS x64 at least for static fields, but it'd be nice to have, especially for PGO where each counter would benefit from it (inc dword ptr [reloc]) #8545.

Fixes #49476

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2021
@jkotas
Copy link
Member

jkotas commented Mar 12, 2021

PS: we don't support rel32 for Linux/macOS x64 at all at the moment,

This does not sound right. We have done work in .NET Core 1.0 to make it work - by allocating codeheap close to libcoreclr.so.

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Mar 12, 2021

PS: we don't support rel32 for Linux/macOS x64 at all at the moment,

This does not sound right. We have done work in .NET Core 1.0 to make it work - by allocating codeheap close to libcoreclr.so.

Here is a repro:

using System.Runtime.CompilerServices;

class Program
{
    static void Main() => Test();

    static int s_Field;

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test() => s_Field;
}

here is the codegen for Test on macOS-x64:

       48B8142EB71601000000 mov      rax, 0x116B72E14
       8B00                 mov      eax, dword ptr [rax]
       C3                   ret

; Total bytes of code 13       ;; (should be 7 bytes with rel32)

From my understanding, this PR - dotnet/coreclr#12831 could fix it.

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@jkotas
Copy link
Member

jkotas commented Mar 13, 2021

Here is a repro:

This should be covered by MEM_RESERVE_EXECUTABLE and g_executableMemoryAllocator. Is the memory for statics not going through this path for some reason?

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Mar 13, 2021

Here is a repro:

This should be covered by MEM_RESERVE_EXECUTABLE and g_executableMemoryAllocator. Is the memory for statics not going through this path for some reason?

That specific rerpro relies on VM's getRelocTypeHint result which is always (DWORD)-1 on non-Windows because of this piece

#if USE_UPPER_ADDRESS
if (s_CodeMinAddr <= (BYTE *)p && (BYTE *)p < s_CodeMaxAddr)
return TRUE;
#endif

USE_UPPER_ADDRESS is always 0 on *nix (this PR wanted to enable it back https://github.com/dotnet/coreclr/pull/12831/files)
and
if (IsPreferredExecutableRange(target))
return IMAGE_REL_BASED_REL32;
}
#endif // TARGET_AMD64
// No hints
return (WORD)-1;

@jkotas
Copy link
Member

jkotas commented Mar 13, 2021

That specific rerpro relies on VM's getRelocTypeHint result which is always (DWORD)-1

I guess getRelocTypeHint should call into PAL on Linux to consult limits of g_executableMemoryAllocator.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

turn fAllowRel32 off globally.

it sounds pretty drastically, do you know if Mono has such deoptimizations that affect all methods after the current? Do we now if we hit fAllowRel32 = false when running benchmarks?

@EgorBo
Copy link
Member Author

EgorBo commented Mar 15, 2021

LGTM,

turn fAllowRel32 off globally.

it sounds pretty drastically, do you know if Mono has such deoptimizations that affect all methods after the current?

Just checked - Mono doesn't support it in JIT mode (emits zero-relative addresses). I guess disabling it globally makes sense for large apps so you won't have to recompile each method twice at some point.

Do we now if we hit fAllowRel32 = false when running benchmarks?

Yeah we do, we have plenty of pinvokes+suppressgc in the corelib which are triggered during start up of any more or less non-hello world app. E.g. I noticed a tiny improvement for a simple benchmark where I just benchmark DateTime.UtcNow.

@EgorBo EgorBo merged commit bc001f6 into dotnet:main Mar 15, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Apr 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Methods with [SuppressGCTransition] are compiled twice
4 participants