Consider making Interlocked.And and friends into JIT intrinsics #32239

stephentoub · 2020-02-13T19:02:40Z

#32216 adds new methods like Interlocked.And. Some of these could be done as JIT intrinsics and replaced by better implementations/instructions in some cases, like lock and on platforms that have it.

cc: @tannergooding

category:cq
theme:jit-intrinsics
skill-level:intermediate
cost:medium
impact:medium

The text was updated successfully, but these errors were encountered:

BruceForstall · 2020-02-14T01:29:49Z

@AndyAyersMS

GrabYourPitchforks · 2020-02-14T20:33:58Z

For context, it turns out the JIT already optimizes this for the existing Interlocked.Add and similar APIs.

using System;
using System.Threading;

public class C {
    public static int M(ref int i) {
        return Interlocked.Add(ref i, 1);
    }
    
    public static void N(ref int i) {
        Interlocked.Add(ref i, 1);
    }
}

; C.M(Int32 ByRef)
    L0000: mov eax, 0x1
    L0005: lock xadd [rcx], eax
    L0009: inc eax
    L000b: ret

; C.N(Int32 ByRef)
    L0000: lock add dword [rcx], 0x1
    L0004: ret

stephentoub · 2020-02-14T20:35:06Z

~~Nice. This can be closed then, right?~~

Oh.. you said Add, not And. Those are way too closely named :-) I was very impressed it was able to reverse the pattern in Interlocked.And :-)

AndyAyersMS · 2020-02-18T19:52:07Z

These should be done via new-style intrinsics. We may also want to convert some of the existing intrinsified interlocked ops over to this mechanism as well.

These APIs were introduced recently in dotnet/runtime#33042 and already used in several places e.g. in `Task`, `InterlockedBitVector32`, `SafeHandle`, `RegexCharClass` It should emit `%reg = atomicrmw and (or)` in LLVM IR. (and unlike `Interlocked.Add` these API return old value, see dotnet/runtime#33102 Addresses dotnet/runtime#32239 for Mono.

These APIs were introduced recently in dotnet/runtime#33042 and already used in several places e.g. in `Task`, `InterlockedBitVector32`, `SafeHandle`, `RegexCharClass` It should emit `%reg = atomicrmw and (or)` in LLVM IR. (and unlike `Interlocked.Add` these API return old value, see dotnet/runtime#33102 Addresses dotnet/runtime#32239 for Mono. Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>

AndyAyersMS · 2020-04-30T21:41:27Z

@Stoub we're triaging proposed 5.0 codegen work -- any thoughts on the priority of this issue?

stephentoub · 2020-04-30T21:43:49Z

From my perspective, "nice to have", but I expect there's other higher priority JIT work that'll be more impactful. These APIs are brand new.

AndyAyersMS · 2020-04-30T21:57:22Z

@GrabYourPitchforks can you make a case for addressing this in 5.0?

GrabYourPitchforks · 2020-04-30T22:35:58Z

I agree with Steve's assessment. Optimizing these APIs might save a few cycles in a handful of places, such as below.

runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs

Lines 739 to 740 in ef2ea8a

    
           // Atomically clear the TASK_STATE_WAIT_COMPLETION_NOTIFICATION bit 
        
           Interlocked.And(ref m_stateFlags, ~TASK_STATE_WAIT_COMPLETION_NOTIFICATION);

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs

Lines 104 to 105 in ef2ea8a

    
           // Set closed state (low order bit of the _state field). 
        
           Interlocked.Or(ref _state, StateBits.Closed);

These are hot code paths, but not so hot that a few extra cycles are going to kill us. :) If you have other more pressing matters, so be it.

AndyAyersMS · 2020-05-01T00:36:55Z

Ok, I'm going to move this to future.

JulieLeeMSFT · 2020-09-24T20:43:19Z

@echesakovMSFT Possible candidate for .NET 6.

EgorBo · 2020-10-15T21:22:49Z

this is a bit tricky for the cases when return value of And/Or is used 🙂

GrabYourPitchforks · 2020-10-15T22:07:45Z

Wouldn't it be able to use the same logic Interlocked.Add and friends use? See my earlier comment at #32239 (comment).

EgorBo · 2020-10-15T22:41:11Z

@GrabYourPitchforks here is what LLVM emits for us (mono-llvm): https://godbolt.org/z/vbsYz3

Quoting LLVM:

      // There are two different ways of expanding RMW instructions:
      // - into a load if it is idempotent
      // - into a Cmpxchg/LL-SC loop otherwise

Or/And aren't idempotent, Add is.

EgorBo · 2020-10-15T22:47:55Z

better demo: https://godbolt.org/z/cMj5Tx

echesakov · 2020-10-23T02:11:28Z

When using Armv8.1 instructions the clang/llvm generated code seems much better than its Armv8.0 counterpart https://godbolt.org/z/Yvjcqb and there is no loops required.

Armv8.0

InterlockedAdd(int*, int):                  // @InterlockedAdd(int*, int)
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        ldaxr   w8, [x0]
        add     w8, w8, w1
        stlxr   w9, w8, [x0]
        cbnz    w9, .LBB0_1
        mov     w0, w8
        ret
InterlockedAnd(int*, int):                  // @InterlockedAnd(int*, int)
.LBB1_1:                                // =>This Inner Loop Header: Depth=1
        ldaxr   w8, [x0]
        and     w8, w8, w1
        stlxr   w9, w8, [x0]
        cbnz    w9, .LBB1_1
        mov     w0, w8
        ret
InterlockedOr(int*, int):                   // @InterlockedOr(int*, int)
.LBB2_1:                                // =>This Inner Loop Header: Depth=1
        ldaxr   w8, [x0]
        orr     w8, w8, w1
        stlxr   w9, w8, [x0]
        cbnz    w9, .LBB2_1
        mov     w0, w8
        ret

Armv8.1

InterlockedAdd(int*, int):                  // @InterlockedAdd(int*, int)
        ldaddal w1, w8, [x0]
        add     w0, w8, w1
        ret
InterlockedAnd(int*, int):                  // @InterlockedAnd(int*, int)
        mvn     w8, w1
        ldclral w8, w8, [x0]
        and     w0, w8, w1
        ret
InterlockedOr(int*, int):                   // @InterlockedOr(int*, int)
        ldsetal w1, w8, [x0]
        orr     w0, w8, w1
        ret

EgorBo · 2020-12-12T11:19:57Z

@echesakovMSFT Do you mind if I try to implement Or and And for arm ? (with the atomics)
In case if you haven't started it yet of course. I just want to do something for Arm back-end 🙂

echesakov · 2020-12-14T21:28:57Z

@EgorBo I haven't started working on this. If you want to work on this, feel free to do so and un-assign me.

EgorBo · 2021-02-10T18:04:27Z

Reopening since #46253 only handled them for arm64

EgorBo · 2021-04-09T08:44:38Z

Moving to Future, it's not easy to optimize it for x86 unfortunately because we only can optimize cases where return value of these methods won't be used and we can't check that fact at the importer phase easily.

tannergooding · 2021-04-09T14:47:17Z

Moving to Future, it's not easy to optimize it for x86 unfortunately because we only can optimize cases where return value of these methods won't be used and we can't check that fact at the importer phase easily.

This is actually a decently broad issue with intrinsics. There are many cases where we want to optimistically create an intrinsic tree node but where we won't know if it can actually be an intrinsic or call until much later.
With hardware intrinsics, this leads to codegen issues for things that are detected to be constants after importation, for example.

Some of the older intrinsics (GT_INTRINSIC) have support in rationalizer to be rewritting as user calls: RewriteIntrinsicAsUserCall
I had taken a look at enabling this more broadly for cases like GT_HWINTRINSIC so we could fix cases like this but ran into issues due to the non-primitive inputs/returns (and these were out of my depth to resolve at the time): #11062 (comment)

Given that these interlocked methods only deal with primitives, its possible that it won't hit the same problems.

EgorBo · 2021-04-09T15:00:46Z

Some of the older intrinsics (GT_INTRINSIC) have support in rationalizer to be rewritting as user calls: RewriteIntrinsicAsUserCall

@tannergooding the problem that if we re-write GT_XAND to a call like that - we'll regress the current case when we inline it as complex expression (contains a loop) as is. see https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgBUALWbAEwEsAdgHMA3DRrEAzE1IMAwgwDeNBqoYq1G1VKYoGbGLgwAKWADMGgjA2zk0lgdeykAlFqXu1DAPTeGAQQEeS1wHABtBGGCOGFhPNQBJR1iwiDAAayiWQJ5TGAtbe2cXMWovAF8acqA==

Perhaps, it's not a big deal ("call" overhead for the case where return value is used and super fast implementation for cases where it's not -- e.g. all BCL usages (there are just a few of them) don't care about return value.

Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 13, 2020

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 18, 2020

AndyAyersMS added this to the 5.0 milestone Feb 18, 2020

EgorBo mentioned this issue Mar 5, 2020

[Mono] Intrinsify Interlocked.And and Interlocked.Or using LLVM #33253

Merged

monojenkins mentioned this issue Mar 5, 2020

[Mono] Intrinsify Interlocked.And and Interlocked.Or using LLVM mono/mono#19135

Merged

AndyAyersMS modified the milestones: 5.0, Future May 1, 2020

JulieLeeMSFT assigned echesakov Sep 24, 2020

echesakov mentioned this issue Oct 20, 2020

[Arm64] Planned JIT work in .NET 6 #43629

Closed

29 tasks

EgorBo assigned EgorBo and unassigned echesakov Dec 15, 2020

EgorBo mentioned this issue Dec 19, 2020

[RyuJIT] Implement Interlocked.And and Interlocked.Or for arm64-v8.1 #46253

Merged

echesakov modified the milestones: Future, 6.0.0 Jan 27, 2021

echesakov linked a pull request Jan 27, 2021 that will close this issue

[RyuJIT] Implement Interlocked.And and Interlocked.Or for arm64-v8.1 #46253

Merged

EgorBo closed this as completed in #46253 Feb 10, 2021

EgorBo reopened this Feb 10, 2021

JulieLeeMSFT added the preview label Mar 15, 2021

EgorBo removed the preview label Apr 9, 2021

EgorBo modified the milestones: 6.0.0, Future Apr 9, 2021

stephentoub mentioned this issue Apr 9, 2022

[API Proposal]: Interlocked class should contain Xor and Not members #67809

Open

EgorBo mentioned this issue Dec 21, 2023

Intrinsify Interlocked.And and Interlocked.Or on XARCH #96258

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 21, 2023

stephentoub modified the milestones: Future, 9.0.0 Dec 26, 2023

EgorBo closed this as completed in #96258 Jan 5, 2024

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 5, 2024

github-actions bot locked and limited conversation to collaborators Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making Interlocked.And and friends into JIT intrinsics #32239

Consider making Interlocked.And and friends into JIT intrinsics #32239

stephentoub commented Feb 13, 2020 •

edited by BruceForstall

Loading

BruceForstall commented Feb 14, 2020

GrabYourPitchforks commented Feb 14, 2020

stephentoub commented Feb 14, 2020 •

edited

Loading

AndyAyersMS commented Feb 18, 2020

AndyAyersMS commented Apr 30, 2020

stephentoub commented Apr 30, 2020

AndyAyersMS commented Apr 30, 2020

GrabYourPitchforks commented Apr 30, 2020

AndyAyersMS commented May 1, 2020

JulieLeeMSFT commented Sep 24, 2020

EgorBo commented Oct 15, 2020

GrabYourPitchforks commented Oct 15, 2020

EgorBo commented Oct 15, 2020 •

edited

Loading

EgorBo commented Oct 15, 2020

echesakov commented Oct 23, 2020

EgorBo commented Dec 12, 2020

echesakov commented Dec 14, 2020

EgorBo commented Feb 10, 2021

EgorBo commented Apr 9, 2021

tannergooding commented Apr 9, 2021 •

edited

Loading

EgorBo commented Apr 9, 2021 •

edited

Loading

Consider making Interlocked.And and friends into JIT intrinsics #32239

Consider making Interlocked.And and friends into JIT intrinsics #32239

Comments

stephentoub commented Feb 13, 2020 • edited by BruceForstall Loading

BruceForstall commented Feb 14, 2020

GrabYourPitchforks commented Feb 14, 2020

stephentoub commented Feb 14, 2020 • edited Loading

AndyAyersMS commented Feb 18, 2020

AndyAyersMS commented Apr 30, 2020

stephentoub commented Apr 30, 2020

AndyAyersMS commented Apr 30, 2020

GrabYourPitchforks commented Apr 30, 2020

AndyAyersMS commented May 1, 2020

JulieLeeMSFT commented Sep 24, 2020

EgorBo commented Oct 15, 2020

GrabYourPitchforks commented Oct 15, 2020

EgorBo commented Oct 15, 2020 • edited Loading

EgorBo commented Oct 15, 2020

echesakov commented Oct 23, 2020

EgorBo commented Dec 12, 2020

echesakov commented Dec 14, 2020

EgorBo commented Feb 10, 2021

EgorBo commented Apr 9, 2021

tannergooding commented Apr 9, 2021 • edited Loading

EgorBo commented Apr 9, 2021 • edited Loading

stephentoub commented Feb 13, 2020 •

edited by BruceForstall

Loading

stephentoub commented Feb 14, 2020 •

edited

Loading

EgorBo commented Oct 15, 2020 •

edited

Loading

tannergooding commented Apr 9, 2021 •

edited

Loading

EgorBo commented Apr 9, 2021 •

edited

Loading