Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use preferred region from PAL for JIT reloc hints #60747

Merged
merged 4 commits into from
Oct 26, 2021

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 21, 2021

We currently have two schemes that try to ensure jitted code and statics
end up close to each other and close to coreclr. In Windows, we have
USE_UPPER_ADDRESS that lazily will reserve memory for loader heaps that
is close to coreclr. For PAL, we eagerly reserve a large chunk of memory
nearby during start up. However for PAL we were not using this region to
report back to the JIT that addresses in this range can use rip-relative
addressing. Add this support.

I have also cleaned up some of the code around USE_UPPER_ADDRESS: I have
renamed it to USE_LAZY_PREFERRED_RANGE and removed some dead code.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@jakobbotsch

This comment has been minimized.

@EgorBo
Copy link
Member

EgorBo commented Oct 22, 2021

I think it's not the first time someone tries to enable USE_UPPER_ADDRESS for Linux, let's see how it goes this time 🙂

@jakobbotsch
Copy link
Member Author

I think it's not the first time someone tries to enable USE_UPPER_ADDRESS for Linux, let's see how it goes this time 🙂

Hmm right, after combing through it some more it does seem we already have an existing, separate mechanism right now to accomplish this on Linux. The main difference seems to be that there we eagerly reserve the memory when PAL is initialized.
But this memory is also nearby coreclr.dll and uses similar circular allocation as the USE_UPPER_ADDRESS scheme, so I think the primary thing we need to do is just to set it up to report hints back similarly to Windows, with no changes in the allocation parts.

@EgorBo
Copy link
Member

EgorBo commented Oct 22, 2021

Related: dotnet/coreclr#12831

We currently have two schemes that try to ensure jitted code and statics
end up close to each other and close to coreclr. In Windows, we have
USE_UPPER_ADDRESS that lazily will reserve memory for loader heaps that
is close to coreclr. For PAL, we eagerly reserve a large chunk of memory
nearby during start up. However for PAL we were not using this region to
report back to the JIT that addresses in this range can use rip-relative
addressing. Add this support.

I have also cleaned up some of the code around USE_UPPER_ADDRESS: I have
renamed it to USE_LAZY_PREFERRED_RANGE and removed some dead code.
@jakobbotsch jakobbotsch force-pushed the support-preferred-range-on-unix branch from d9c0ded to 961e085 Compare October 22, 2021 13:54
@jakobbotsch jakobbotsch changed the title Rename USE_UPPER_ADDRESS -> USE_PREFERRED_RANGE, support it on Unix platforms Use preferred region from PAL for JIT reloc hints Oct 22, 2021
@jakobbotsch
Copy link
Member Author

I have reworked this; it now does not touch anything related to allocation. Instead it sets up the preferred range for Unix platforms based on the area that is reserved for code and statics by PAL on start up.

@janvorli
Copy link
Member

TL;DR - handling the ranges in the runtime is kind of useless on Unix, all the executable memory allocated is within the ranges that would ever be requested unless we exceed the virtual memory amount reserved at process start for executable allocator in PAL or we fail to reserve such memory. And if any of those happen, we don't have a mean to get memory in a specific range.

Full details:

Virtual memory allocation works differently on Unix and on Windows. On Unix, there is no VirtualQuery - like function, so except for allocations that were previously made by our emulation of VirtualAlloc, we have no idea which block of memory is available and how large it is. We could possibly implement it by parsing the textual /proc/self/maps file on every request, possibly with some caching, but that would be very slow.

That was the reason why we have introduced the executable memory allocator in PAL that tries to reserve a large sequential range of virtual memory address space close to the libcoreclr.so at process startup time (see ExecutableMemoryAllocator::TryReserveInitialMemory). So when runtime tries to reserve memory in a range, the only thing we check is that the next block that we can get from that allocator is within the requested range.

In other words, on Unix, if we fail to reserve virtual memory range close to the libcoreclr.so at the startup time, we will never be able to allocate memory that is within a requested range. This could happen, e.g. with some security features of SELinux enabled, the mmap will never honor a requested base address and always allocate a random one.
And vice versa, if that initial reservation succeeds, we will always get memory in the requested range, even for allocations where no range was requested (unless the total code size exceeds the 2GB that we use as the reservation).

On Windows, things are different. We loop over the range, use VirtualQuery to find a free range that intersects with the requested range and if we find one, we reserve memory from that range.

So that's the reason why we didn't have the preferred ranges stuff enabled for Unix.

@janvorli
Copy link
Member

So it seems that the only thing that we really need to update is the ExecutableAllocator::IsPreferredExecutableRange so that it can check that the address is within the reserved memory or coreclr.

@jakobbotsch jakobbotsch marked this pull request as ready for review October 22, 2021 16:35
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Oct 22, 2021

Select microbenchmark from performance repo:

https://github.com/dotnet/performance/blob/a453235176943a222f126762b4b5436f6687d62e/src/benchmarks/micro/runtime/perflab/LowLevelPerf.cs#L320-L325

BenchmarkDotNet=v0.13.1.1611-nightly, OS=ubuntu 20.04
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-alpha.1.21518.14
  [Host]     : .NET 7.0.0 (7.0.21.48001), X64 RyuJIT
  Job-QOOXAR : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-PECDSW : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Method Job Toolchain Mean Error StdDev Median Min Max Ratio Allocated
GenericClassGenericStaticField Job-QOOXAR /main/corerun 42.54 μs 0.235 μs 0.208 μs 42.44 μs 42.35 μs 42.98 μs 1.00 -
GenericClassGenericStaticField Job-PECDSW /this_pr/corerun 20.97 μs 0.047 μs 0.041 μs 20.96 μs 20.91 μs 21.06 μs 0.49 -

PMI diffs over frameworks:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 61609864
Total bytes of diff: 59773331
Total bytes of delta: -1836533 (-2.98 % of base)
Total relative delta: NaN
    diff is an improvement.
    relative diff is a regression.


Top file improvements (bytes):
     -188422 : Microsoft.CodeAnalysis.VisualBasic.dasm (-2.65 % of base)
     -149985 : FSharp.Core.dasm (-3.88 % of base)
     -141299 : System.Private.CoreLib.dasm (-2.33 % of base)
     -133373 : Microsoft.CodeAnalysis.CSharp.dasm (-2.42 % of base)
     -102909 : System.Private.Xml.dasm (-2.52 % of base)
      -81365 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-2.09 % of base)
      -68013 : System.Linq.Parallel.dasm (-3.10 % of base)
      -56686 : Microsoft.CodeAnalysis.dasm (-2.78 % of base)
      -47984 : System.Collections.Immutable.dasm (-2.89 % of base)
      -47455 : System.Linq.dasm (-4.19 % of base)
      -45082 : System.Data.Common.dasm (-2.63 % of base)
      -36915 : Newtonsoft.Json.dasm (-3.86 % of base)
      -34585 : System.Threading.Tasks.Dataflow.dasm (-3.09 % of base)
      -32954 : System.Linq.Queryable.dasm (-8.50 % of base)
      -32354 : System.Linq.Expressions.dasm (-3.76 % of base)
      -30159 : System.Text.Json.dasm (-2.71 % of base)
      -26444 : CommandLine.dasm (-5.06 % of base)
      -25761 : System.Collections.dasm (-4.37 % of base)
      -24537 : System.Private.DataContractSerialization.dasm (-2.77 % of base)
      -19582 : System.ComponentModel.Composition.dasm (-5.31 % of base)

197 total files with Code Size differences (197 improved, 0 regressed), 75 unchanged.

Top method regressions (bytes):
          55 (1.93 % of base) : System.Data.Common.dasm - System.Data.Common.SqlDoubleStorage:Aggregate(System.Int32[],int):System.Object:this
          40 (1.42 % of base) : System.Data.Common.dasm - System.Data.Common.SqlInt64Storage:Aggregate(System.Int32[],int):System.Object:this
          20 (1.60 % of base) : System.Data.Common.dasm - System.Data.Common.SByteStorage:Aggregate(System.Int32[],int):System.Object:this
          16 (0.66 % of base) : System.Data.Common.dasm - System.Data.Common.SqlByteStorage:Aggregate(System.Int32[],int):System.Object:this
          15 (1.10 % of base) : System.Data.Common.dasm - System.Data.Common.UInt16Storage:Aggregate(System.Int32[],int):System.Object:this
          15 (1.10 % of base) : System.Data.Common.dasm - System.Data.Common.UInt32Storage:Aggregate(System.Int32[],int):System.Object:this
          13 (1.03 % of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore[Vector`1][System.Numerics.Vector`1[System.Single]]:ConsumeReservedMessagesNonGreedy():this
          12 (5.97 % of base) : System.Private.CoreLib.dasm - System.Array:CreateInstance(System.Type,System.Int32[]):System.Array
          11 (0.79 % of base) : System.Data.Common.dasm - System.Data.Common.Int16Storage:Aggregate(System.Int32[],int):System.Object:this
           9 (15.25 % of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator:IsThisReceiver(Microsoft.CodeAnalysis.CSharp.BoundExpression):bool:this
           7 (10.61 % of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.EXPRExtensions:GetSeqVal(Microsoft.CSharp.RuntimeBinder.Semantics.Expr):Microsoft.CSharp.RuntimeBinder.Semantics.Expr
           6 (0.74 % of base) : System.Linq.dasm - AppendPrependN`1[Vector`1][System.Numerics.Vector`1[System.Single]]:LazyToArray():System.Numerics.Vector`1[System.Single][]:this
           4 (0.98 % of base) : ILCompiler.TypeSystem.ReadyToRun.dasm - Internal.TypeSystem.InstantiatedMethod:get_Signature():Internal.TypeSystem.MethodSignature:this
           4 (0.13 % of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this
           4 (0.93 % of base) : FSharp.Core.dasm - Microsoft.FSharp.Collections.ArrayModule:chunkBySize$cont@796(int,System.Int16[],int,Microsoft.FSharp.Core.Unit):System.Int16[][]
           4 (1.14 % of base) : System.Private.CoreLib.dasm - System.Guid:TryParseHex(System.ReadOnlySpan`1[Char],byref,byref):bool (2 methods)
           4 (0.64 % of base) : System.Private.CoreLib.dasm - System.Number:TryParseUInt32HexNumberStyle(System.ReadOnlySpan`1[Char],int,byref):int
           4 (0.62 % of base) : System.Private.CoreLib.dasm - System.Number:TryParseUInt64HexNumberStyle(System.ReadOnlySpan`1[Char],int,byref):int
           3 (0.29 % of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore[Byte][System.Byte]:ConsumeReservedMessagesNonGreedy():this
           3 (0.29 % of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore[Int32][System.Int32]:ConsumeReservedMessagesNonGreedy():this

Top method improvements (bytes):
       -2040 (-7.66 % of base) : System.Private.Xml.dasm - System.Xml.Schema.XsdBuilder:.cctor()
       -2025 (-7.92 % of base) : System.Net.Security.dasm - System.Net.Security.TlsCipherSuiteData:.cctor()
       -1875 (-3.57 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.CtfTraceEventSource:InitEventMap():System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.ETWMapping, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.65.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]
       -1395 (-1.57 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServerTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.65.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.65.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -1350 (-4.57 % of base) : System.Private.Xml.dasm - System.Xml.Xsl.IlGen.XmlILMethods:.cctor()
       -1146 (-8.44 % of base) : FSharp.Core.dasm - HashCompare:GenericEqualityObj$cont@1336(bool,System.Collections.IEqualityComparer,System.Object,System.Object,System.Array,Microsoft.FSharp.Core.Unit):bool
       -1032 (-4.29 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:.cctor() (344 methods)
        -984 (-2.95 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.VisualBasicCommandLineParser:Parse(System.Collections.Generic.IEnumerable`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],System.String,System.String,System.String):Microsoft.CodeAnalysis.VisualBasic.VisualBasicCommandLineArguments:this
        -893 (-7.48 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.RegisteredTraceEventParser:GetManifestForRegisteredProvider(System.Guid):System.String
        -876 (-10.04 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.SyntaxFactory:GetNodeTypes():System.Collections.Generic.IEnumerable`1[[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]
        -876 (-10.76 % of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.XsltLoader:.ctor():this
        -855 (-4.29 % of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:.cctor() (285 methods)
        -753 (-6.26 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__0-0(Roslyn.Utilities.ObjectReader):System.Object:this (251 methods)
        -747 (-6.64 % of base) : System.Private.DataContractSerialization.dasm - System.Runtime.Serialization.ReflectionReader:GetCollectionSetItemDelegate(System.Runtime.Serialization.CollectionDataContract,System.Object,bool):CollectionSetItemDelegate:this (8 methods)
        -741 (-5.50 % of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.PredefinedMembers:.cctor()
        -693 (-4.35 % of base) : FSharp.Core.dasm - <StartupCode$FSharp-Core>.$Linq:.cctor()
        -660 (-8.80 % of base) : System.Private.Xml.dasm - System.Xml.Schema.XdrBuilder:.cctor()
        -656 (-4.53 % of base) : FSharp.Core.dasm - Microsoft.FSharp.Quotations.FSharpExpr:GetLayout(bool):Microsoft.FSharp.Text.StructuredPrintfImpl.Layout:this
        -654 (-4.65 % of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.CallInstruction:FastCreate(System.Reflection.MethodInfo,System.Reflection.ParameterInfo[]):System.Linq.Expressions.Interpreter.CallInstruction (17 methods)
        -643 (-2.55 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Binder:ReportOverloadResolutionFailureForASingleCandidate(Microsoft.CodeAnalysis.VisualBasic.VisualBasicSyntaxNode,Microsoft.CodeAnalysis.Location,int,byref,System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.VisualBasic.BoundExpression, Microsoft.CodeAnalysis.VisualBasic, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]],System.Collections.Immutable.ImmutableArray`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]],bool,bool,bool,bool,Microsoft.CodeAnalysis.DiagnosticBag,Microsoft.CodeAnalysis.VisualBasic.Symbol,bool,Microsoft.CodeAnalysis.VisualBasic.VisualBasicSyntaxNode,Microsoft.CodeAnalysis.VisualBasic.Symbol):this

Top method regressions (percentages):
           9 (15.25 % of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.CodeGen.CodeGenerator:IsThisReceiver(Microsoft.CodeAnalysis.CSharp.BoundExpression):bool:this
           7 (10.61 % of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.EXPRExtensions:GetSeqVal(Microsoft.CSharp.RuntimeBinder.Semantics.Expr):Microsoft.CSharp.RuntimeBinder.Semantics.Expr
          12 (5.97 % of base) : System.Private.CoreLib.dasm - System.Array:CreateInstance(System.Type,System.Int32[]):System.Array
           3 (3.53 % of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Symbols.PointerTypeSymbol:GetHashCode():int:this
           3 (2.07 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.AppDomainAssemblyResolveHandlerInvokedTraceData:PayloadValue(int):System.Object:this
          55 (1.93 % of base) : System.Data.Common.dasm - System.Data.Common.SqlDoubleStorage:Aggregate(System.Int32[],int):System.Object:this
           3 (1.78 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata27TemplateHATraceData:PayloadValue(int):System.Object:this
           3 (1.78 % of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata28TemplateHATraceData:PayloadValue(int):System.Object:this
          20 (1.60 % of base) : System.Data.Common.dasm - System.Data.Common.SByteStorage:Aggregate(System.Int32[],int):System.Object:this
          40 (1.42 % of base) : System.Data.Common.dasm - System.Data.Common.SqlInt64Storage:Aggregate(System.Int32[],int):System.Object:this
           4 (1.14 % of base) : System.Private.CoreLib.dasm - System.Guid:TryParseHex(System.ReadOnlySpan`1[Char],byref,byref):bool (2 methods)
          15 (1.10 % of base) : System.Data.Common.dasm - System.Data.Common.UInt16Storage:Aggregate(System.Int32[],int):System.Object:this
          15 (1.10 % of base) : System.Data.Common.dasm - System.Data.Common.UInt32Storage:Aggregate(System.Int32[],int):System.Object:this
          13 (1.03 % of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore[Vector`1][System.Numerics.Vector`1[System.Single]]:ConsumeReservedMessagesNonGreedy():this
           4 (0.98 % of base) : ILCompiler.TypeSystem.ReadyToRun.dasm - Internal.TypeSystem.InstantiatedMethod:get_Signature():Internal.TypeSystem.MethodSignature:this
           4 (0.93 % of base) : FSharp.Core.dasm - Microsoft.FSharp.Collections.ArrayModule:chunkBySize$cont@796(int,System.Int16[],int,Microsoft.FSharp.Core.Unit):System.Int16[][]
          11 (0.79 % of base) : System.Data.Common.dasm - System.Data.Common.Int16Storage:Aggregate(System.Int32[],int):System.Object:this
           6 (0.74 % of base) : System.Linq.dasm - AppendPrependN`1[Vector`1][System.Numerics.Vector`1[System.Single]]:LazyToArray():System.Numerics.Vector`1[System.Single][]:this
          16 (0.66 % of base) : System.Data.Common.dasm - System.Data.Common.SqlByteStorage:Aggregate(System.Int32[],int):System.Object:this
           4 (0.64 % of base) : System.Private.CoreLib.dasm - System.Number:TryParseUInt32HexNumberStyle(System.ReadOnlySpan`1[Char],int,byref):int

Top method improvements (percentages):
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[__Canon][System.__Canon]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Byte][System.Byte]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Double][System.Double]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Int16][System.Int16]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Int32][System.Int32]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Int64][System.Int64]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Nullable`1][System.Nullable`1[System.Int32]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<CreateDerivative>g__MkDerivativesOfElems|24_0>d[Vector`1][System.Numerics.Vector`1[System.Single]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Diagnostics.DiagnosticSource.dasm - <<get_Baggage>g__Iterate|80_0>d:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[__Canon][System.__Canon]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Byte][System.Byte]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Double][System.Double]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Int16][System.Int16]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Int32][System.Int32]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Int64][System.Int64]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Nullable`1][System.Nullable`1[System.Int32]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Reverse>g__ReverseElements|27_0>d[Vector`1][System.Numerics.Vector`1[System.Single]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Transform>g__TransformElements|25_0>d`1[__Canon,Nullable`1][System.__Canon,System.Nullable`1[System.Int32]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Transform>g__TransformElements|25_0>d`1[Byte,Nullable`1][System.Byte,System.Nullable`1[System.Int32]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this
          -7 (-53.85 % of base) : System.Text.RegularExpressions.dasm - <<Transform>g__TransformElements|25_0>d`1[Double,Nullable`1][System.Double,System.Nullable`1[System.Int32]]:System.Collections.IEnumerable.GetEnumerator():System.Collections.IEnumerator:this

161660 total methods with Code Size differences (161631 improved, 29 regressed), 207139 unchanged.

--------------------------------------------------------------------------------

cc @dotnet/jit-contrib

@janvorli
Copy link
Member

Nice! I cannot wait seeing asp.net benchmark results with this change.

#if !defined(HOST_UNIX)
#define USE_UPPER_ADDRESS 1
#if defined(HOST_UNIX)
// In PAL we have a smechanism that reserves memory on start up that is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// In PAL we have a smechanism that reserves memory on start up that is
// In PAL we have a mechanism that reserves memory on start up that is

OUT LPVOID *start,
OUT LPVOID *end)
{
g_executableMemoryAllocator.GetReservedRange(start, end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we combine it with the libcoreclr.so range (using real base and the guesstimate CoreClrLibrarySize as a size)? In case there was a space in between, we could pretend it is not there. I think that without it, calls to helpers in it would not be optimized, or would they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always assume we can do rip-relative calls and the runtime will generate jump stubs if necessary, so calls to helpers already work out well with the current scheme. The question is if we ever bake in static addresses in coreclr.dll as part of jitted code -- those could not be contained without what you suggest. I can try to make this change tomorrow or Monday and see if there are any additional diffs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That further helps, we sometime call helpers through indirections of static variables stored in coreclr.dll:

// To avoid using a jump stub we always call certain helpers using an indirect call.
// Because when using a direct call and the target is father away than 2^31 bytes,
// the direct call instead goes to a jump stub which jumps to the jit helper.
// However in this process the jump stub will corrupt RAX.
//
// The set of helpers for which RAX must be preserved are the profiler probes
// and the STOP_FOR_GC helper which maps to JIT_RareDisableHelper.
// In the case of the STOP_FOR_GC helper RAX can be holding a function return value.
//
if (dynamicFtnNum == DYNAMIC_CORINFO_HELP_STOP_FOR_GC ||
dynamicFtnNum == DYNAMIC_CORINFO_HELP_PROF_FCN_ENTER ||
dynamicFtnNum == DYNAMIC_CORINFO_HELP_PROF_FCN_LEAVE ||
dynamicFtnNum == DYNAMIC_CORINFO_HELP_PROF_FCN_TAILCALL)
{
_ASSERTE(ppIndirection != NULL);
*ppIndirection = &hlpDynamicFuncTable[dynamicFtnNum].pfnHelper;
return NULL;
}

In particular CORINFO_HELP_STOP_FOR_GC is one that has quite a few hits. That furthermore is conditional based on a static variable, so each site changes like this:

 G_M40601_IG08:
        mov      byte  ptr [rbx+12], 1
-       mov      rdi, 0xD1FFAB1E
-       cmp      dword ptr [rdi], 0
+       cmp      dword ptr [(reloc)], 0
        je       SHORT G_M40601_IG09
-       mov      rsi, 0xD1FFAB1E
-       call     [rsi]CORINFO_HELP_STOP_FOR_GC
-                                               ;; bbWeight=0.50 PerfScore 4.25
+       call     [CORINFO_HELP_STOP_FOR_GC]
+                                               ;; bbWeight=0.50 PerfScore 4.00

The additional diffs are:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59803609
Total bytes of diff: 59772903
Total bytes of delta: -30706 (-0.05 % of base)
Total relative delta: NaN
    diff is an improvement.
    relative diff is a regression.


Top file improvements (bytes):
       -7410 : System.Private.CoreLib.dasm (-0.13 % of base)
       -4853 : System.Drawing.Common.dasm (-1.26 % of base)
       -1946 : System.Security.Cryptography.Algorithms.dasm (-0.49 % of base)
       -1170 : System.Security.Cryptography.X509Certificates.dasm (-0.32 % of base)
       -1026 : System.Net.Security.dasm (-0.57 % of base)
        -895 : System.Net.Sockets.dasm (-0.40 % of base)
        -876 : System.Diagnostics.Process.dasm (-0.98 % of base)
        -852 : System.DirectoryServices.Protocols.dasm (-0.72 % of base)
        -772 : System.Security.Cryptography.OpenSsl.dasm (-0.69 % of base)
        -713 : System.Console.dasm (-0.92 % of base)
        -686 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.02 % of base)
        -618 : System.Private.Xml.dasm (-0.02 % of base)
        -540 : System.Data.Odbc.dasm (-0.25 % of base)
        -528 : System.IO.Ports.dasm (-1.13 % of base)
        -474 : System.IO.Pipes.dasm (-1.22 % of base)
        -450 : System.Reflection.MetadataLoadContext.dasm (-0.20 % of base)
        -449 : System.IO.MemoryMappedFiles.dasm (-2.12 % of base)
        -415 : System.Net.Primitives.dasm (-0.55 % of base)
        -402 : System.Net.Mail.dasm (-0.21 % of base)
        -378 : System.Net.Http.dasm (-0.05 % of base)

74 total files with Code Size differences (74 improved, 0 regressed), 198 unchanged.

Top method regressions (bytes):
           9 (0.10 % of base) : System.Private.Xml.dasm - <GetTokenAsync>d__173:MoveNext():this

Top method improvements (bytes):
        -300 (-6.46 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef,byref):int (25 methods)
        -258 (-12.48 % of base) : System.Reflection.MetadataLoadContext.dasm - System.Reflection.TypeLoading.CoreTypeHelpers:GetFullName(int,byref,byref)
        -192 (-6.90 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef,int):int (16 methods)
        -144 (-9.59 % of base) : System.Security.Cryptography.Algorithms.dasm - Internal.Cryptography.AesImplementation:GetAlgorithm(int,int,int):long
        -132 (-8.67 % of base) : System.Drawing.Common.dasm - System.Drawing.MacSupport:GetCGContextForView(long):System.Drawing.CarbonContext
        -120 (-6.78 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(long,byref):int (10 methods)
        -100 (-4.54 % of base) : System.Drawing.Common.dasm - System.Drawing.Graphics:CopyFromScreenX11(int,int,int,int,System.Drawing.Size,int):this
         -72 (-8.14 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke():long (7 methods)
         -72 (-7.19 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(long,long):int (6 methods)
         -72 (-7.87 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(long) (7 methods)
         -72 (-3.43 % of base) : System.DirectoryServices.Protocols.dasm - ILStubClass:IL_STUB_PInvoke(System.DirectoryServices.Protocols.ConnectionHandle,int,byref):int (6 methods)
         -72 (-5.97 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef,byref,byref):int (6 methods)
         -72 (-6.78 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef,long,int):int (6 methods)
         -66 (-7.39 % of base) : System.Net.Sockets.dasm - ILStubClass:IL_STUB_PInvoke(long,long):int (6 methods)
         -60 (-7.19 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(long,int):int (5 methods)
         -60 (-7.19 % of base) : System.Security.Cryptography.X509Certificates.dasm - ILStubClass:IL_STUB_PInvoke(long,long):int (5 methods)
         -60 (-7.45 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke(long):long (5 methods)
         -60 (-5.91 % of base) : System.Drawing.Common.dasm - ILStubClass:IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef,System.Runtime.InteropServices.HandleRef,byref):int (5 methods)
         -60 (-7.87 % of base) : System.Drawing.Common.dasm - System.Drawing.MacSupport:GetCGContextForNSView(long):System.Drawing.CocoaContext
         -54 (-8.06 % of base) : System.Private.CoreLib.dasm - ILStubClass:IL_STUB_PInvoke():int (5 methods)

Top method regressions (percentages):
           9 (0.10 % of base) : System.Private.Xml.dasm - <GetTokenAsync>d__173:MoveNext():this

Top method improvements (percentages):
          -9 (-22.50 % of base) : System.Private.CoreLib.dasm - System.Runtime.InteropServices.Marshal:SetLastSystemError(int)
          -9 (-21.95 % of base) : System.Private.CoreLib.dasm - System.Diagnostics.Stopwatch:GetTimestamp():long
          -9 (-21.95 % of base) : System.Private.CoreLib.dasm - System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
          -9 (-21.95 % of base) : System.Private.CoreLib.dasm - System.Runtime.InteropServices.Marshal:GetLastSystemError():int
          -9 (-20.93 % of base) : System.Net.NameResolution.dasm - Microsoft.Extensions.Internal.ValueStopwatch:StartNew():Microsoft.Extensions.Internal.ValueStopwatch
          -9 (-20.93 % of base) : System.Net.Http.dasm - Microsoft.Extensions.Internal.ValueStopwatch:StartNew():Microsoft.Extensions.Internal.ValueStopwatch
          -9 (-20.93 % of base) : System.Net.Security.dasm - Microsoft.Extensions.Internal.ValueStopwatch:StartNew():Microsoft.Extensions.Internal.ValueStopwatch
          -9 (-20.93 % of base) : Microsoft.Extensions.Http.dasm - Microsoft.Extensions.Internal.ValueStopwatch:StartNew():Microsoft.Extensions.Internal.ValueStopwatch
          -9 (-19.15 % of base) : System.Diagnostics.Process.dasm - System.Diagnostics.Process:SetDelayedSigChildConsoleConfigurationHandler()
          -3 (-18.75 % of base) : System.Net.Http.dasm - Http2Stream:get_StatusHeaderName():System.ReadOnlySpan`1[Byte]
          -3 (-18.75 % of base) : Microsoft.Extensions.DependencyModel.dasm - Microsoft.Extensions.DependencyModel.DependencyContextJsonReader:get_Utf8Bom():System.ReadOnlySpan`1[Byte]
          -9 (-18.75 % of base) : System.IO.FileSystem.Watcher.dasm - Sys:GetLastError():int
          -9 (-18.75 % of base) : System.Private.CoreLib.dasm - Sys:GetLastError():int
          -9 (-18.75 % of base) : System.Diagnostics.Process.dasm - Sys:GetLastError():int
          -9 (-18.75 % of base) : System.Net.Sockets.dasm - Sys:GetLastError():int
          -3 (-18.75 % of base) : System.Memory.dasm - System.Buffers.Text.Base64:get_DecodingMap():System.ReadOnlySpan`1[SByte]
          -3 (-18.75 % of base) : System.Memory.dasm - System.Buffers.Text.Base64:get_EncodingMap():System.ReadOnlySpan`1[Byte]
          -3 (-18.75 % of base) : System.Drawing.Common.dasm - System.Drawing.ImageConverter:get_BMBytes():System.ReadOnlySpan`1[Byte]
          -3 (-18.75 % of base) : System.Drawing.Common.dasm - System.Drawing.ImageConverter:get_PBrush():System.ReadOnlySpan`1[Byte]
          -3 (-18.75 % of base) : System.Drawing.Primitives.dasm - System.Drawing.KnownColorTable:get_ColorKindTable():System.ReadOnlySpan`1[Byte]

2330 total methods with Code Size differences (2329 improved, 1 regressed), 366469 unchanged.

--------------------------------------------------------------------------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the change, can you take another look please?

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@jakobbotsch jakobbotsch merged commit 7084d44 into dotnet:main Oct 26, 2021
@jakobbotsch jakobbotsch deleted the support-preferred-range-on-unix branch October 26, 2021 09:03
@EgorBo
Copy link
Member

EgorBo commented Nov 2, 2021

ubuntu-x64 improvements dotnet/perf-autofiling-issues#2088 (or PGO? d2ca8e1...7084d44)

@EgorBo
Copy link
Member

EgorBo commented Nov 2, 2021

alpine-x64 improvements (GenericClassWithIntGenericInstanceField benchmark)

@ghost ghost locked as resolved and limited conversation to collaborators Dec 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants