Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: save generics context for late devirtualization #63420

Merged
merged 3 commits into from
Jan 6, 2022

Conversation

AndyAyersMS
Copy link
Member

If we have a non-inline candidate call with generics context, save
the context so it's available for late devirtualization.

Fixes a missing devirtualization reported in #63283.

I am deliberately leaving LateDevirtualizationInfo more general than
necessary as it may serve as a jumping-off point for enabling late
inlining.

If we have a non-inline candidate call with generics context, save
the context so it's available for late devirtualization.

Fixes a missing devirtualization reported in dotnet#63283.

I am deliberately leaving `LateDevirtualizationInfo` more general than
necessary as it may serve as a jumping-off point for enabling late
inlining.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 5, 2022
@ghost ghost assigned AndyAyersMS Jan 5, 2022
@ghost
Copy link

ghost commented Jan 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

If we have a non-inline candidate call with generics context, save
the context so it's available for late devirtualization.

Fixes a missing devirtualization reported in #63283.

I am deliberately leaving LateDevirtualizationInfo more general than
necessary as it may serve as a jumping-off point for enabling late
inlining.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

SPMI diffs not interesting as any method that would benefit from new behavior gets a missing info error (~176K of them all told).

PMI diffs show a few hundred improvements. Regressions are from extra CSEs/spills.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 62817061
Total bytes of diff: 62813898
Total bytes of delta: -3163 (-0.01 % of base)
Total relative delta: -5.06
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
        -841 : System.Linq.Parallel.dasm (-0.04% of base)
        -534 : Microsoft.CodeAnalysis.dasm (-0.03% of base)
        -246 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.01% of base)
        -200 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
        -194 : Microsoft.CodeAnalysis.CSharp.dasm (-0.00% of base)
        -130 : System.Private.Xml.dasm (-0.00% of base)
         -98 : System.Private.Xml.Linq.dasm (-0.05% of base)
         -92 : System.ComponentModel.Composition.dasm (-0.02% of base)
         -78 : Newtonsoft.Json.dasm (-0.01% of base)
         -70 : System.Private.CoreLib.dasm (-0.00% of base)
         -56 : System.Text.RegularExpressions.dasm (-0.01% of base)
         -54 : System.Security.Principal.Windows.dasm (-0.09% of base)
         -49 : System.Threading.Tasks.Dataflow.dasm (-0.00% of base)
         -35 : xunit.execution.dotnet.dasm (-0.01% of base)
         -28 : xunit.assert.dasm (-0.02% of base)
         -28 : System.Net.Http.dasm (-0.00% of base)
         -27 : System.Speech.dasm (-0.01% of base)
         -25 : Microsoft.Extensions.Caching.Memory.dasm (-0.11% of base)
         -21 : Microsoft.Extensions.DependencyModel.dasm (-0.03% of base)
         -21 : xunit.console.dasm (-0.02% of base)

50 total files with Code Size differences (50 improved, 0 regressed), 223 unchanged.

Top method regressions (bytes):
          57 ( 3.78% of base) : System.Private.Xml.dasm - XmlILVisitor:Function(QilFunction):this
           8 ( 0.14% of base) : System.Private.Xml.dasm - QilGenerator:PrecompileProtoTemplatesHeaders():this
           4 ( 0.33% of base) : System.Private.Xml.dasm - QilGenerator:GetNsVar(QilList):QilIterator:this
           2 ( 0.12% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitSort(QilLoop):QilNode:this
           1 ( 0.03% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitGlobalValues(QilList):this

Top method improvements (bytes):
         -91 (-13.70% of base) : System.Linq.Parallel.dasm - NullableDoubleMinMaxAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -56 (-2.50% of base) : System.Linq.Parallel.dasm - AnyAllSearchOperator`1:Aggregate():bool:this (8 methods)
         -56 (-2.61% of base) : System.Linq.Parallel.dasm - ContainsSearchOperator`1:Aggregate():bool:this (8 methods)
         -56 (-2.90% of base) : System.Linq.Parallel.dasm - CountAggregationOperator`1:InternalAggregate(byref):int:this (8 methods)
         -56 (-2.88% of base) : System.Linq.Parallel.dasm - LongCountAggregationOperator`1:InternalAggregate(byref):long:this (8 methods)
         -49 (-1.03% of base) : System.Linq.Parallel.dasm - ElementAtQueryOperator`1:Aggregate(byref,bool):bool:this (8 methods)
         -49 (-1.55% of base) : Microsoft.CodeAnalysis.dasm - EnumerableExtensions:ToDictionary(IEnumerable`1,Func`2,IEqualityComparer`1):Dictionary`2 (8 methods)
         -49 (-7.72% of base) : Microsoft.CodeAnalysis.dasm - ImmutableSetWithInsertionOrder`1:GetEnumerator():IEnumerator`1:this (8 methods)
         -49 (-7.72% of base) : Microsoft.CodeAnalysis.dasm - ImmutableSetWithInsertionOrder`1:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (8 methods)
         -49 (-2.04% of base) : System.Threading.Tasks.Dataflow.dasm - SingleProducerSingleConsumerQueue_DebugView:get_Items():ref:this (8 methods)
         -49 (-2.21% of base) : Microsoft.CodeAnalysis.dasm - UnionCollection`1:GetEnumerator():IEnumerator`1:this (8 methods)
         -42 (-0.24% of base) : System.Text.RegularExpressions.dasm - <EnumeratePaths>d__28:MoveNext():bool:this (8 methods)
         -42 (-0.72% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:get_CodeBlockStartActionsByAnalyzer():ImmutableDictionary`2:this (6 methods)
         -42 (-0.70% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:get_NodeActionsByAnalyzerAndKind():ImmutableDictionary`2:this (6 methods)
         -42 (-0.90% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:GetCodeBlockActionsByAnalyzer(byref,ImmutableArray`1):ImmutableDictionary`2 (6 methods)
         -40 (-0.96% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:CreateNonReusableLocalProxies(Result,byref):this (8 methods)
         -40 (-2.84% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitStrConcat(QilStrConcat):QilNode:this
         -31 (-2.40% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ClsComplianceChecker:CheckForAttributeWithArrayArgumentInternal(ImmutableArray`1):this
         -29 (-5.68% of base) : System.Linq.Parallel.dasm - NullableLongMinMaxAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -26 (-3.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodCompiler:CompileSynthesizedMethods(PrivateImplementationDetails):this

Top method regressions (percentages):
          57 ( 3.78% of base) : System.Private.Xml.dasm - XmlILVisitor:Function(QilFunction):this
           4 ( 0.33% of base) : System.Private.Xml.dasm - QilGenerator:GetNsVar(QilList):QilIterator:this
           8 ( 0.14% of base) : System.Private.Xml.dasm - QilGenerator:PrecompileProtoTemplatesHeaders():this
           2 ( 0.12% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitSort(QilLoop):QilNode:this
           1 ( 0.03% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitGlobalValues(QilList):this

Top method improvements (percentages):
         -91 (-13.70% of base) : System.Linq.Parallel.dasm - NullableDoubleMinMaxAggregationOperator:InternalAggregate(byref):Nullable`1:this
          -7 (-9.21% of base) : System.Private.CoreLib.dasm - CounterPayload:GetEnumerator():IEnumerator`1:this
          -7 (-9.21% of base) : System.Private.CoreLib.dasm - CounterPayload:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this
          -7 (-9.21% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Histogram:GetEnumerator():IEnumerator`1:this
          -7 (-9.21% of base) : System.Private.CoreLib.dasm - IncrementingCounterPayload:GetEnumerator():IEnumerator`1:this
          -7 (-9.21% of base) : System.Private.CoreLib.dasm - IncrementingCounterPayload:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this
         -49 (-7.72% of base) : Microsoft.CodeAnalysis.dasm - ImmutableSetWithInsertionOrder`1:GetEnumerator():IEnumerator`1:this (8 methods)
         -49 (-7.72% of base) : Microsoft.CodeAnalysis.dasm - ImmutableSetWithInsertionOrder`1:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (8 methods)
         -25 (-6.33% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MethodCompiler:CompileSynthesizedMethods(PrivateImplementationDetails,DiagnosticBag):this
         -29 (-5.68% of base) : System.Linq.Parallel.dasm - NullableLongMinMaxAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -11 (-5.21% of base) : System.Reflection.MetadataLoadContext.dasm - MetadataLoadContext:Dispose(bool):this
         -21 (-5.10% of base) : System.Linq.Parallel.dasm - NullableFloatAverageAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -21 (-5.02% of base) : System.Linq.Parallel.dasm - IntMinMaxAggregationOperator:InternalAggregate(byref):int:this
         -21 (-4.98% of base) : System.Linq.Parallel.dasm - NullableDoubleAverageAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -21 (-4.94% of base) : System.Linq.Parallel.dasm - LongMinMaxAggregationOperator:InternalAggregate(byref):long:this
         -21 (-4.93% of base) : System.Linq.Parallel.dasm - NullableIntAverageAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -21 (-4.93% of base) : System.Linq.Parallel.dasm - NullableLongAverageAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -21 (-4.85% of base) : System.Linq.Parallel.dasm - NullableIntMinMaxAggregationOperator:InternalAggregate(byref):Nullable`1:this
         -11 (-4.56% of base) : System.ComponentModel.Composition.dasm - AggregateCatalog:GetEnumerator():IEnumerator`1:this
         -11 (-4.56% of base) : System.ComponentModel.Composition.dasm - DirectoryCatalog:GetEnumerator():IEnumerator`1:this

299 total methods with Code Size differences (294 improved, 5 regressed), 278692 unchanged.

Codegen for Cmp from #63283 is now

; Assembly listing for method C:Cmp(System.String):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     ref  ->  rcx         class-hnd single-def
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M25227_IG01:              ;; offset=0000H
						;; bbWeight=1    PerfScore 0.00
G_M25227_IG02:              ;; offset=0000H
       48BA2030009811020000 mov      rdx, 0x21198003020      ; ""
       488B12               mov      rdx, gword ptr [rdx]
       3909                 cmp      dword ptr [rcx], ecx
						;; bbWeight=1    PerfScore 5.25
G_M25227_IG03:              ;; offset=000FH
       FF25B3A0ACFF         tail.jmp [System.String:CompareTo(System.String):int:this]
						;; bbWeight=1    PerfScore 2.00

@AndyAyersMS
Copy link
Member Author

Seeing some failures where GTF_CALL_M_LATE_DEVIRT is set but the LateDevirtualizationInfo is null. Debugging now.

@AndyAyersMS
Copy link
Member Author

Updated to copy the union field --we can clone GTF_CALL_M_LATE_DEVIRT calls and the data they refer to is immutable.

We can't safely clone inline or gdv candidates this way, but we don't do that now.

@AndyAyersMS
Copy link
Member Author

@EgorBo can you take a look?

{
JITDUMP("\nSaving context %p for call [%06u]\n", exactContextHnd, dspTreeID(origCall));
origCall->gtCallMoreFlags |= GTF_CALL_M_LATE_DEVIRT;
LateDevirtualizationInfo* const info = new (this, CMK_Inlining) LateDevirtualizationInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add more fields to LateDevirtualizationInfo? because for now it looks like allocation is not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? That's what I meant by

I am deliberately leaving LateDevirtualizationInfo more general than
necessary as it may serve as a jumping-off point for enabling late inlining.

const bool isLateDevirtualization = true;
bool explicitTailCall = (call->AsCall()->gtCallMoreFlags & GTF_CALL_M_EXPLICIT_TAILCALL) != 0;
const bool explicitTailCall = (call->AsCall()->gtCallMoreFlags & GTF_CALL_M_EXPLICIT_TAILCALL) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: there is IsTailPrefixedCall()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will change it over. Also, we already know call is GenTreeCall*.

@EgorBo
Copy link
Member

EgorBo commented Jan 20, 2022

Improvement on win-arm64 dotnet/perf-autofiling-issues#2978

@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants