Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Support object stack allocation #11192

Closed
15 of 20 tasks
erozenfeld opened this issue Oct 3, 2018 · 21 comments
Closed
15 of 20 tasks

JIT: Support object stack allocation #11192

erozenfeld opened this issue Oct 3, 2018 · 21 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization
Milestone

Comments

@erozenfeld
Copy link
Member

erozenfeld commented Oct 3, 2018

This issue will track work on supporting object stack allocation in the jit. See dotnet/coreclr#20251 for the document describing the work.

The initial goal is to be able to remove heap allocation in a simple example like this:

class Foo
{
    public int f1;
    public int f2;
    public Foo(int f1, int f2)
    {
        this.f1 = f1;
        this.f2 = f2;
    }
}

class Test
{
    static int f1;
    static int f2;
    public static int Main()
    {
         Foo foo = new Foo(f1, f2);
         return foo.f1 + foo.f2;
    }
}

and then in a similar example where the class has gc fields.

Proposed initial steps are:

I will be modifying and extending this list as the work progresses.

cc @dotnet/jit-contrib

category:cq
theme:object-stack-allocation
skill-level:expert
cost:extra-large
impact:large

@erozenfeld erozenfeld self-assigned this Oct 3, 2018
@jkotas
Copy link
Member

jkotas commented Oct 4, 2018

add getObjHeaderSize jit interface method and its implementations

Why is this needed?

@erozenfeld
Copy link
Member Author

We'll have to allocate space for ObjHeader before the stack-allocated object unless we can prove that it won't be needed for synchronization or hash code storage.

@jkotas
Copy link
Member

jkotas commented Oct 4, 2018

needed for synchronization or hash code storage

I think using sychronization, etc. should prohibit stack allocation, for the initial iteration at least. Otherwise, you would also need a helper call to clear these objects from the synchronization tables and teach the synchronization tables to allow references to objects that do not live on the GC heap.

@mikedn
Copy link
Contributor

mikedn commented Oct 4, 2018

modify lvaSetStruct to allow creating locals corresponding to stack-allocated classes
modify gc writebarrier logic to apply appropriate barriers when assigning to fields of (possibly) stack-allocated objects

How would stack allocated object fields would be accessed? Would existing FIELD/IND nodes be replaced with LCL_FLD/LCL_VAR nodes? Are writer barriers still needed if the object is allocated on stack?

@Suchiman
Copy link
Contributor

Suchiman commented Oct 4, 2018

What would be the point of trying to synchronize on a stackallocated object? If it's considered for stack allocation then it means it doesn't escape which means no other thread can attempt to synchronize on it so any attempts to do so should be a no-op. The hash code could be derived from it's stack address as the object won't move until its deallocated.

@erozenfeld
Copy link
Member Author

We'll need to detect calls to RuntimeHelpers.GetHashCode on the stackallocated object. They will be problematic if we don't allocate ObjHeader before the object.

@erozenfeld
Copy link
Member Author

modify lvaSetStruct to allow creating locals corresponding to stack-allocated classes
modify gc writebarrier logic to apply appropriate barriers when assigning to fields of (possibly) stack-allocated objects

How would stack allocated object fields would be accessed? Would existing FIELD/IND nodes be replaced with LCL_FLD/LCL_VAR nodes? Are writer barriers still needed if the object is allocated on stack?

Stack allocated object will be treated like a struct and its fields may be promoted. No write barriers are needed if we are assigning to a field of a stack-allocated object.

@erozenfeld
Copy link
Member Author

We will have cases where we are assigning to a field of an object that may live on the stack or on the heap (e.g., at joins or when passing a stack allocated object to another method). We'll have to detect these cases and use checked write barriers.

@erozenfeld
Copy link
Member Author

I suppose calls to RuntimeHelpers.GetHashCode will look like calls to native code so we will consider the object escaping and won't stack-allocate, so we don't need ObjHeader for that.

@erozenfeld
Copy link
Member Author

dotnet/coreclr#20814 implemented an initial version of object stack allocation. I updated the items in the description of this issue to mark what's done and added more items to the road map.

@erozenfeld
Copy link
Member Author

dotnet/coreclr#21950 added several improvements:
objects with gc fields can be stack allocated;
optimization is enabled on x86;
gc pointer reporting is less conservative when the method has stack-allocated objects.

@benaadams
Copy link
Member

What would be the point of trying to synchronize on a stackallocated object? If it's considered for stack allocation then it means it doesn't escape which means no other thread can attempt to synchronize on it so any attempts to do so should be a no-op.

Call it a "lock elision" optimization and highlight it as a feature 😉 I think that's what Java does, unless I misunderstand what they do.

Equally could drop interlocked in objects that don't escape (where they operate on the object)?

@mikedn
Copy link
Contributor

mikedn commented Feb 27, 2019

Call it a "lock elision" optimization and highlight it as a feature 😉 I think that's what Java does, unless I misunderstand what they do.

AFAIR the first Java collection classes were synchronized. Of course, they found out that this isn't such a great idea and their implementation of escape analysis was also trying to help by removing synchronization when the collection object wasn't escaping the stack.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@erozenfeld erozenfeld removed their assignment Oct 12, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall added optimization and removed JitUntriaged CLR JIT issues needing additional triage labels Nov 25, 2020
@hez2010
Copy link
Contributor

hez2010 commented Aug 17, 2021

I've got very promising result in my simple test code, but it still has a gap from structs (majorly because the lack of work item enable promotion of fields of stack-allocated objects):

[Benchmark]
public int PointClassTest()
{
    var p1 = new PointClass(4, 5);
    var p2 = new PointClass(3, 7);
    var result = AddClass(p1, p2);
    return result.X + result.Y;
}

[Benchmark]
public int PointStructTest()
{
    var p1 = new PointStruct(4, 5);
    var p2 = new PointStruct(3, 7);
    var result = AddStruct(p1, p2);
    return result.X + result.Y;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
PointClass AddClass(PointClass x, PointClass y)
{
    return new PointClass(x.X + y.X, x.Y + y.Y);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
PointStruct AddStruct(PointStruct x, PointStruct y)
{
    return new PointStruct(x.X + y.X, x.Y + y.Y);
}

record class PointClass(int X, int Y);
record struct PointStruct(int X, int Y);
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-7660U CPU 2.50GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-preview.7.21379.14
  [Host]      : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT
  PGO + EA    : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT, TieredPgo + TieredCompilation + TC_QuickJit + TC_QuickJitForLoops + JitObjectStackAllocation
  PGO + No EA : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT, TieredPgo + TieredCompilation + TC_QuickJit + TC_QuickJitForLoops

Runtime=.NET 6.0  
Method Job Mean Error StdDev Median Code Size Gen 0 Allocated
PointClassTest PGO + EA 0.7836 ns 0.0415 ns 0.0857 ns 0.7720 ns 157 B - -
PointStructTest PGO + EA 0.0045 ns 0.0100 ns 0.0083 ns 0.0000 ns 6 B - -
PointClassTest PGO + No EA (Baseline) 12.2350 ns 0.5015 ns 1.4548 ns 11.5586 ns 120 B 0.0344 72 B
PointStructTest PGO + No EA 0.0000 ns 0.0000 ns 0.0000 ns 0.0000 ns 6 B - -

Codegen:

.NET 6.0.0 (6.0.21.37719), X64 RyuJIT, Config: PGO + EA
; DevirtualizationTest.PointClassTest()
       sub       rsp,38
       xor       eax,eax
       mov       [rsp+8],rax
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+10],xmm4
       vmovdqa   xmmword ptr [rsp+20],xmm4
       mov       [rsp+30],rax
       mov       rax,offset MT_DevirtualizationTest+PointClass
       mov       [rsp+28],rax
       mov       dword ptr [rsp+30],4
       mov       dword ptr [rsp+34],5
       lea       rax,[rsp+28]
       mov       rdx,offset MT_DevirtualizationTest+PointClass
       mov       [rsp+18],rdx
       mov       dword ptr [rsp+20],3
       mov       dword ptr [rsp+24],7
       lea       rdx,[rsp+18]
       mov       ecx,[rax+8]
       add       ecx,[rdx+8]
       mov       eax,[rax+0C]
       mov       r8,offset MT_DevirtualizationTest+PointClass
       mov       [rsp+8],r8
       add       eax,[rdx+0C]
       mov       [rsp+10],ecx
       mov       [rsp+14],eax
       lea       rax,[rsp+8]
       mov       edx,[rax+8]
       add       edx,[rax+0C]
       mov       eax,edx
       add       rsp,38
       ret
; Total bytes of code 157
; DevirtualizationTest.PointStructTest()
       mov       eax,13
       ret
; Total bytes of code 6
.NET 6.0.0 (6.0.21.37719), X64 RyuJIT, Config: PGO + No EA
; DevirtualizationTest.PointClassTest()
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       rcx,offset MT_DevirtualizationTest+PointClass
       call      CORINFO_HELP_NEWSFAST
       mov       rsi,rax
       mov       dword ptr [rsi+8],4
       mov       dword ptr [rsi+0C],5
       mov       rcx,offset MT_DevirtualizationTest+PointClass
       call      CORINFO_HELP_NEWSFAST
       mov       rdi,rax
       mov       dword ptr [rdi+8],3
       mov       dword ptr [rdi+0C],7
       mov       ebx,[rsi+8]
       add       ebx,[rdi+8]
       mov       esi,[rsi+0C]
       mov       rcx,offset MT_DevirtualizationTest+PointClass
       call      CORINFO_HELP_NEWSFAST
       add       esi,[rdi+0C]
       mov       [rax+8],ebx
       mov       [rax+0C],esi
       mov       edx,[rax+8]
       add       edx,[rax+0C]
       mov       eax,edx
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
; Total bytes of code 120
; DevirtualizationTest.PointStructTest()
       mov       eax,13
       ret
; Total bytes of code 6

Is there any roadmap to make further progress on this?

@TonyValenti
Copy link

Is this actively being worked on?

@Suchiman
Copy link
Contributor

@TonyValenti no, see #1661 (comment)

@AndyAyersMS
Copy link
Member

We will shortly be working on our post 7.0 plans and this is definitely in the mix. In particular I would like to see us working towards stack allocation of some boxes, since:

  • these were not accounted for in the initial (somewhat discouraging) scouting for opportunities and may be relatively abundant.
  • struct methods with gc type fields already use checked write barriers) so stack allocating them is pay for play. That's not the case for ref classes.
  • we can trivially rely on many struct methods not to capture the struct address in any way (so they can't make the struct escape) so we may not need to uplevel inlining or do some kind of interprocedural analysis to find viable candidates.
  • we may be able to optimize away unboxing stubs and (in some cases) not require the stack allocation to represent the full boxed object, just its payload (or sometimes even, just its type).
  • in conjunction with GDV driven loop test cloning and increasing use of struct enumerators we may be able to de-abstract some IEnumerator<T> uses and avoid heap allocation of the enumerator in the "happy path" case.

AndyAyersMS added a commit that referenced this issue Jul 1, 2024
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method.

The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away.
 
Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570

---------

Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@teo-tsirpanis
Copy link
Contributor

Fixed by #103361.

@hez2010
Copy link
Contributor

hez2010 commented Jul 1, 2024

Fixed by #103361.

btw there're still some unfinished items like:

  • make the analysis more sophisticated to handle increasingly more complex examples (eg fields of value classes, or fields of unescaped ref classes
  • enable stack allocation of constant-sized arrays
  • enable stack allocation of strings
  • R2R/NAOT support

@CodingMadness
Copy link

@hez2010 But if you guys could get the point in the list finished: enable stack allocation of constant-sized arrays , wouldn't that also not get rid off automatically of enable stack allocation of strings , since strings are constant sized arrays of char or not, with some more hidden meta-data stuff backed up, or not?

@hez2010
Copy link
Contributor

hez2010 commented Jul 20, 2024

@hez2010 But if you guys could get the point in the list finished: enable stack allocation of constant-sized arrays , wouldn't that also not get rid off automatically of enable stack allocation of strings , since strings are constant sized arrays of char or not, with some more hidden meta-data stuff backed up, or not?

No. string is completely different and unlike any other types, allocations of string are done by calling to either FastAllocateString or String:.ctor, where both of them are internal methods implemented in unmanaged code. To support string stack allocation, we would need to add specific support.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization
Projects
None yet
Development

No branches or pull requests