Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: suboptimal generated code for struct overlapped field manipulation #69254

Open
Tracked by #110642
hez2010 opened this issue May 12, 2022 · 6 comments
Open
Tracked by #110642
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@hez2010
Copy link
Contributor

hez2010 commented May 12, 2022

Description

Repro (compile with roslyn features/ref-fields branch):

using System.Diagnostics.CodeAnalysis;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

var color = new Color { R = 1, G = 2, B = 3, A = 4 };
color.Raw[2] = 3;
color.SRaw[1] = 4;
Console.WriteLine(color.G);

[StructLayout(LayoutKind.Explicit)]
struct Color
{
    [FieldOffset(0)] public byte R;
    [FieldOffset(1)] public byte G;
    [FieldOffset(2)] public byte B;
    [FieldOffset(3)] public byte A;

    [FieldOffset(0)] public int Rgba;

    [UnscopedRef] public ColorView<byte> Raw => new(ref this);
    [UnscopedRef] public ColorView<short> SRaw => new(ref this);
}

ref struct ColorView<T> where T : unmanaged
{
    private ref Color color;

    public ColorView(ref Color color)
    {
        this.color = ref color;
    }
    private static ref T Throw() => throw new IndexOutOfRangeException();

    public ref T this[uint index]
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get
        {
            unsafe
            {
                return ref sizeof(T) * index >= sizeof(Color) ?
                    ref Throw() :
                    ref Unsafe.Add(ref Unsafe.As<Color, T>(ref color), index);
            }
        }
    }
}

Codegen for Main (tier1):

G_M61237_IG01:  ;; offset=0000H
       sub      rsp, 56
						;; size=4 bbWeight=1 PerfScore 0.25

G_M61237_IG02:  ;; offset=0004H
       xor      ecx, ecx
       mov      dword ptr [rsp+28H], ecx
       mov      byte  ptr [rsp+28H], 1
       mov      byte  ptr [rsp+29H], 2
       mov      byte  ptr [rsp+2AH], 3
       mov      byte  ptr [rsp+2BH], 4
       mov      ecx, dword ptr [rsp+28H]
       mov      dword ptr [rsp+30H], ecx
       lea      rcx, bword ptr [rsp+30H]
       add      rcx, 2
       mov      byte  ptr [rcx], 3
       lea      rcx, bword ptr [rsp+30H]
       mov      word  ptr [rcx+02H], 4
       movzx    rcx, byte  ptr [rsp+31H]
       call     [System.Console:WriteLine(int)]
       nop      
						;; size=69 bbWeight=1 PerfScore 14.75

G_M61237_IG03:  ;; offset=0049H
       add      rsp, 56
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 78

G_M56640_IG02 can be folded into a constant.

Expected codegen of G_M56640_IG02:

mov ecx, 2
call [System.Console:WriteLine(int)]

Clang codegen: https://godbolt.org/z/W7416EsrT

Configuration

.NET: build from main

category:cq
theme:structs
skill-level:expert
cost:large
impact:medium

@hez2010 hez2010 added the tenet-performance Performance related issue label May 12, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 12, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 12, 2022
@ghost
Copy link

ghost commented May 12, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Repro (compile with roslyn features/ref-fields branch):

using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

var color = new Color { R = 1, G = 2, B = 3, A = 4 };
color.Raw[2] = 3;
color.SRaw[1] = 4;
Console.WriteLine(color.G);

[StructLayout(LayoutKind.Explicit)]
struct Color
{
    [FieldOffset(0)] public byte R;
    [FieldOffset(1)] public byte G;
    [FieldOffset(2)] public byte B;
    [FieldOffset(3)] public byte A;

    [FieldOffset(0)] public int Rgba;

    public ColorView<byte> Raw => new(ref this);
    public ColorView<short> SRaw => new(ref this);
}

ref struct ColorView<T> where T : unmanaged
{
    private ref Color color;
    public ColorView(ref Color color)
    {
        this.color = ref color;
    }
    private static ref T Throw() => throw new IndexOutOfRangeException();
    
    public ref T this[uint index]
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get
        {
            unsafe
            {
                return ref sizeof(T) * index >= sizeof(Color) ? 
                    ref Throw() : 
                    ref Unsafe.Add(ref Unsafe.AsRef<T>(Unsafe.AsPointer(ref color)), index);
            }
        }
    }
}

Codegen for Main (tier1):

G_M56640_IG01:
       sub      rsp, 56
                                                ;; size=4 bbWeight=1    PerfScore 0.25
G_M56640_IG02:
       xor      ecx, ecx
       mov      dword ptr [rsp+28H], ecx
       mov      byte  ptr [rsp+28H], 1
       mov      byte  ptr [rsp+29H], 2
       mov      byte  ptr [rsp+2AH], 3
       mov      byte  ptr [rsp+2BH], 4
       mov      ecx, dword ptr [rsp+28H]
       mov      dword ptr [rsp+30H], ecx
       lea      rcx, bword ptr [rsp+30H]
       add      rcx, 2
       mov      byte  ptr [rcx], 3
       lea      rcx, bword ptr [rsp+30H]
       add      rcx, 2
       mov      word  ptr [rcx], 4
       movzx    rcx, byte  ptr [rsp+31H]
       call     [System.Console:WriteLine(int)]
       nop
                                                ;; size=72 bbWeight=1    PerfScore 15.00
G_M56640_IG03:
       add      rsp, 56
       ret

Expected codegen of G_M56640_IG02:

mov ecx, 2
call [System.Console:WriteLine(int)]

Clang codegen: https://godbolt.org/z/W7416EsrT

Configuration

.NET: build from main

Author: hez2010
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@hez2010 hez2010 changed the title JIT: suboptimal generated for ref struct manipulation JIT: suboptimal generated for struct overlapped field manipulation May 12, 2022
@AndyAyersMS
Copy link
Member

The jit historically has been super-cautious about structs with overlapping fields because they are a hallmark of aliasing. But now that we have the beginnings of a physical alias model (thanks to #68712) we might be able to revisit some of this.

cc @dotnet/jit-contrib

@BruceForstall BruceForstall changed the title JIT: suboptimal generated for struct overlapped field manipulation JIT: suboptimal generated code for struct overlapped field manipulation May 13, 2022
@BruceForstall BruceForstall added this to the Future milestone May 13, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 13, 2022
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label May 13, 2022
@jakobbotsch
Copy link
Member

#76928 would solve the basic issues around this by making promotion more "physical" instead of based upon struct layout.

@jakobbotsch
Copy link
Member

One major blocker here is that we end up address exposing color because we end up with separate statements for all the block copies between different temps for ColorView, and the inlined constructor that assigns the field to the address of it. We would somehow need to collapse all of this at an early point to realize that it ends up immediately dereferenced. That seems quite hard.

@MichalPetryka
Copy link
Contributor

ref Unsafe.AsRef<T>(Unsafe.AsPointer(ref color))

FYI this is a GC hole, use Unsafe.As<Color, T> instead.

@hez2010
Copy link
Contributor Author

hez2010 commented Jun 30, 2023

ref Unsafe.AsRef<T>(Unsafe.AsPointer(ref color))

FYI this is a GC hole, use Unsafe.As<Color, T> instead.

Thanks for pointing out. I have fixed the code in the original issue, while the codegen is still not changed.

@jakobbotsch jakobbotsch self-assigned this Aug 4, 2023
@jakobbotsch jakobbotsch modified the milestones: Future, 9.0.0 Aug 4, 2023
@jakobbotsch jakobbotsch added the Priority:2 Work that is important, but not critical for the release label May 3, 2024
jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue May 29, 2024
This changes local morph to run in RPO when optimizations are enabled.
It adds infrastructure to track and propagate LCL_ADDR values assigned
to locals (or struct fields) during local morph. This allows us to avoid
address exposure in cases where the destination local does not actually
end up escaping in any way.

Example:
```csharp
public struct Awaitable
{
    public int Opts;

    public Awaitable(bool value)
    {
        Opts = value ? 1 : 2;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test() => new Awaitable(false).Opts;
```

Before:
```asm
G_M59043_IG01:  ;; offset=0x0000
       push     rax
						;; size=1 bbWeight=1 PerfScore 1.00

G_M59043_IG02:  ;; offset=0x0001
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
						;; size=15 bbWeight=1 PerfScore 3.25

G_M59043_IG03:  ;; offset=0x0010
       add      rsp, 8
       ret
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 21

```

After:
```asm
G_M59043_IG02:  ;; offset=0x0000
       mov      eax, 2
						;; size=5 bbWeight=1 PerfScore 0.25

G_M59043_IG03:  ;; offset=0x0005
       ret
```

Propagating the addresses works much like local assertion prop in morph
does. Proving that the locations that were stored to do not escape
afterwards is done with a simplistic approach: we check globally that no
reads of the location exists, and if so, we replace the `LCL_ADDR`
stored to them by a constant 0. We leave it up to liveness to clean up
the stores themselves.

This could be more sophisticated, but in practice this handles the
reported cases just fine.

If we were able to remove any `LCL_ADDR` in this way then we run an
additional pass over the locals of the IR to compute the final set of
exposed locals.

Fix dotnet#87072
Fix dotnet#102273
Fix dotnet#102518

This is still not sufficient to handle dotnet#69254. To handle that case we
need to handle transferring assertions for struct copies, and also
handle proving that specific struct fields containing local addresses do
not escape. It is probably doable, but for now I will leave it as future
work.
@jakobbotsch jakobbotsch removed the Priority:2 Work that is important, but not critical for the release label May 30, 2024
@jakobbotsch jakobbotsch modified the milestones: 9.0.0, Future May 30, 2024
jakobbotsch added a commit that referenced this issue May 31, 2024
This changes local morph to run in RPO when optimizations are enabled. It adds
infrastructure to track and propagate LCL_ADDR values assigned to locals during
local morph. This allows us to avoid address exposure in cases where the
destination local does not actually end up escaping in any way.

Example:
```csharp
public struct Awaitable
{
    public int Opts;

    public Awaitable(bool value)
    {
        Opts = value ? 1 : 2;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test() => new Awaitable(false).Opts;
```

Before:
```asm
G_M59043_IG01:  ;; offset=0x0000
       push     rax
						;; size=1 bbWeight=1 PerfScore 1.00

G_M59043_IG02:  ;; offset=0x0001
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
						;; size=15 bbWeight=1 PerfScore 3.25

G_M59043_IG03:  ;; offset=0x0010
       add      rsp, 8
       ret
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 21

```

After:
```asm
G_M59043_IG02:  ;; offset=0x0000
       mov      eax, 2
						;; size=5 bbWeight=1 PerfScore 0.25

G_M59043_IG03:  ;; offset=0x0005
       ret
```

Propagating the addresses works much like local assertion prop in morph does.
Proving that the locals that were stored to do not escape afterwards is done
with a simplistic approach: we check globally that no reads of the locals
exists, and if so, we replace the `LCL_ADDR` stored to them by a constant 0. We
leave it up to liveness to clean up the stores themselves.

If we were able to remove any `LCL_ADDR` in this way then we run an additional
pass over the locals of the IR to compute the final set of exposed locals.

This could be more sophisticated, but in practice this handles the reported
cases just fine.

Fix #87072
Fix #102273
Fix #102518

This is still not sufficient to handle #69254. To handle that we would need more
support around tracking the values of struct fields, and handling of promoted
fields. This PR currently does not handle promoted fields at all; we use
`lvHasLdAddrOp` as a conservative approximation of address exposure on the
destination locals, and promoted structs almost always have this set. If we were
to handle promoted fields we would need some other way to determine that a
destination holding a local address couldn't be exposed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants