Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroll Buffer.Memmove for arm64 #83740

Merged
merged 14 commits into from
Mar 25, 2023
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 21, 2023

Follow up to #83638 to do the same on arm64.

void CopySpan(Span<byte> dst, ReadOnlySpan<byte> src) =>
    src.Slice(0, 10).TryCopyTo(dst);

AltJit-arm64:

; Method Prog:CopySpan(System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):this
G_M41396_IG01:              
        A9BF7BFD          stp     fp, lr, [sp, #-0x10]!
        910003FD          mov     fp, sp
G_M41396_IG02:              
        7100289F          cmp     w4, #10
        54000123          blo     G_M41396_IG05
        7100285F          cmp     w2, #10
        540000A3          blo     G_M41396_IG04
G_M41396_IG03:              
        F9400060          ldr     x0, [x3]         ;; <---
        F8402062          ldr     x2, [x3, #0x02]  ;; <---
        F9000020          str     x0, [x1]         ;; <---
        F8002022          str     x2, [x1, #0x02]  ;; <---
G_M41396_IG04:              
        A8C17BFD          ldp     fp, lr, [sp], #0x10
        D65F03C0          ret     lr
G_M41396_IG05:              
        D2960C00          movz    x0, #0xB060      // code for System.ThrowHelper:ThrowArgumentOutOfRangeException()
        F2A90A80          movk    x0, #0x4854 LSL #16
        F2CFFF20          movk    x0, #0x7FF9 LSL #32
        F9400000          ldr     x0, [x0]
        D63F0000          blr     x0
        D43E0000          brk_windows #0
; Total bytes of code: 72

Range: 1..64 bytes (x64 handles 1..128 bytes or 1..256 bytes if we enable AVX-512)

@ghost ghost assigned EgorBo Mar 21, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 21, 2023
@ghost
Copy link

ghost commented Mar 21, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up to #83638 to do the same on arm64.

void CopySpan(Span<byte> dst, ReadOnlySpan<byte> src) =>
    src.Slice(0, 10).TryCopyTo(dst);

AltJit-arm64:

; Method Prog:CopySpan(System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):this
G_M41396_IG01:              
        A9BF7BFD          stp     fp, lr, [sp, #-0x10]!
        910003FD          mov     fp, sp
G_M41396_IG02:              
        7100289F          cmp     w4, #10
        54000123          blo     G_M41396_IG05
        7100285F          cmp     w2, #10
        540000A3          blo     G_M41396_IG04
G_M41396_IG03:              
        F9400060          ldr     x0, [x3]         ;; <---
        F8402062          ldr     x2, [x3, #0x02]  ;; <---
        F9000020          str     x0, [x1]         ;; <---
        F8002022          str     x2, [x1, #0x02]  ;; <---
G_M41396_IG04:              
        A8C17BFD          ldp     fp, lr, [sp], #0x10
        D65F03C0          ret     lr
G_M41396_IG05:              
        D2960C00          movz    x0, #0xB060      // code for System.ThrowHelper:ThrowArgumentOutOfRangeException()
        F2A90A80          movk    x0, #0x4854 LSL #16
        F2CFFF20          movk    x0, #0x7FF9 LSL #32
        F9400000          ldr     x0, [x0]
        D63F0000          blr     x0
        D43E0000          brk_windows #0
; Total bytes of code: 72
Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo marked this pull request as ready for review March 21, 2023 20:49
@EgorBo
Copy link
Member Author

EgorBo commented Mar 21, 2023

PTAL @dotnet/jit-contrib @jakobbotsch @SingleAccretion (since you reviewed the previous version) 🙂

It is possible some parts can be shared but it doesn't look to be a lot of code to bother probably and there are still differences.

@MichalPetryka volanteered to address some TODO-CQ on both x64 and arm64.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 22, 2023

/azp list

@azure-pipelines

This comment was marked as outdated.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 22, 2023

/azp run runtime-coreclr r2r, runtime-coreclr crossgen2, runtime-coreclr jitstress2-jitstressregs, runtime-coreclr jitstressregs, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

assert((cur != nullptr) && "Not enough user arguments in GetUserArgByIndex");
for (unsigned i = 0; i < index || cur->IsArgAddedLate();)
{
if (!cur->IsArgAddedLate())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use cur->GetWellKnownArg() != WellKnownArg::None instead? (And then the Remarks "The current implementation doesn't..." can be removed?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would ignore the 'this' arg and also ShiftLow/ShiftHigh for x86 shift helpers are arguably "user args" (in that they manifest in the IL). Maybe we should add an IsUserArg() or IsILArg() that has these special cases and use it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added IsUserArg()

src/coreclr/jit/compiler.h Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Mar 24, 2023

@jakobbotsch @BruceForstall I've addressed your feedback

@EgorBo
Copy link
Member Author

EgorBo commented Mar 24, 2023

SPMI diffs (didn't expect to see them, but looks like Kunal triggered a new SPMI collection yesterday)

Size regressions are expected in cases where we unroll large blocks, didn't expect to see TP improvements

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@neon-sunset
Copy link
Contributor

SPMI diffs

@a74nh It looks like this PR introduces a lot more opt. opportunities for #83773 just thought you might want to know 🙂

e.g.

@@ -156,27 +156,24 @@ G_M51463_IG11:        ; bbWeight=0.50, gcrefRegs=100000 {x20}, byrefRegs=0000 {}
             mov     w0, #25
             bl      <unknown method>
             ; gcrRegs +[x0]
-            mov     x19, x0
-            ; gcrRegs +[x19]
-            ldrsb   wzr, [x19]
-            add     x0, x19, #12
-            ; gcrRegs -[x0]
-            ; byrRegs +[x0]
-            add     x1, x20, #12
+            ldrsb   wzr, [x0]
+            add     x1, x0, #12
             ; byrRegs +[x1]
-            mov     x2, #50
-            movz    x3, #0xD1FFAB1E      // code for <unknown method>
-            movk    x3, #0xD1FFAB1E LSL #16
-            movk    x3, #1 LSL #32
-            ldr     x3, [x3]
-            blr     x3
-            ; gcrRegs -[x20]
-            ; byrRegs -[x0-x1]
-            mov     x20, x19
-            ; gcrRegs +[x20]
-						;; size=52 bbWeight=0.50 PerfScore 6.25
+            add     x2, x20, #12
+            ; byrRegs +[x2]
+            ldr     q16, [x2]
+            ldr     q17, [x2, #0x10]
+            ldr     q18, [x2, #0x20]
+            ldr     q19, [x2, #0x22]
+            str     q16, [x1]
+            str     q17, [x1, #0x10]
+            str     q18, [x1, #0x20]
+            str     q19, [x1

@EgorBo
Copy link
Member Author

EgorBo commented Mar 24, 2023

Yes, I could emit ldp/stp in the first place but decided to leave that for an upcomming peephole 🙂 (to keep my impl simpler)

@EgorBo
Copy link
Member Author

EgorBo commented Mar 24, 2023

Failures: #83655

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants