Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arm64] Add RCPC ISA (8.3+) and use ldap for volatile reads #67384

Merged
merged 12 commits into from
Apr 12, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 31, 2022

This PR adds a new ISA RCPC (Release Consistent Processor Consistent support) for arm64-v8.3+ (optionally available on arm64-v8.2) in order to rely on ldapr/b/h for volatile reads with acquire/release semantics, see #67374

Apple M1 seems support it but most likely it's just an alias for ldar there so no boost.

Closes #67374

static volatile int a;

static void Test() => a++;

codegen diff

; Assembly listing for method Test()
G_M16289_IG01: 
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M16289_IG02:
        D287BA80          movz    x0, #0x3dd4
        F2B00C00          movk    x0, #0x8060 LSL #16
        F2C00040          movk    x0, #2 LSL #32
-       88DFFC01          ldar    w1, [x0]
+       B8BFC001          ldapr   w1, [x0]
        11000421          add     w1, w1, #1
        889FFC01          stlr    w1, [x0]
G_M16289_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
; Total bytes of code 40
; ============================================================

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 31, 2022
@ghost ghost assigned EgorBo Mar 31, 2022
@ghost
Copy link

ghost commented Mar 31, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a new ISA RCPC (Release Consistent Processor Consistent support) for arm64-v8.3+ (optionally available on arm64-v8.2) in order to rely on ldapr/b/h for volatile reads with acquire/release semantics, see #67374

Closes #67374

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo changed the title Arm64 rcpc [arm64] Add RCPC ISA (8.3+) and use ldap for volatile reads Mar 31, 2022
@EgorBo EgorBo closed this Mar 31, 2022
@EgorBo EgorBo reopened this Mar 31, 2022
@EgorBo EgorBo marked this pull request as ready for review March 31, 2022 17:31
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
@EgorBo
Copy link
Member Author

EgorBo commented Apr 1, 2022

@VSadov do you have a benchmark/scenario in mind to see improvements from this change?

@VSadov
Copy link
Member

VSadov commented Apr 1, 2022

@EgorBo This can have effect on scenarios that mix volatile writes and reads. Like writing/reading to ConcurrentQueue in a loop.

The new instruction is not necessarily cheaper by itself - scenarios just doing lots of volatile reads may not be affected. There need to be some volatile writes (or reference writes to heap locations) in the mix as LDAR needs to consider preceding STLR while LDAPR does not.

Also note that in order to see gains, the hardware should take advantage of relaxed semantics. Some early implementations of LDAPR could be just aliases of LDAR.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 2, 2022

I wasn't able to reproduce improvements on M1 so probably it is indeed is just a renamed ldar there but I guess it's still makes sense to have - I've attached a codegen diff example in the description.

@dotnet/jit-contrib PTAL, no diffs but in fact all volatile loads were changed from ldar[b/h] to ldapr[b/h]

@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2022
@EgorBo EgorBo deleted the arm64-rcpc branch October 5, 2022 02:16
@EgorBo
Copy link
Member Author

EgorBo commented Oct 5, 2022

NOTE: Unfortunately, this PR doesn't detect RCPC feature set on Windows as there is no official API for that yet.

@kunalspathak
Copy link
Member

NOTE: Unfortunately, this PR doesn't detect RCPC feature set on Windows as there is no official API for that yet.

This was originally added for Mac, right? Were you trying it on Ampere for Windows?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 5, 2022

NOTE: Unfortunately, this PR doesn't detect RCPC feature set on Windows as there is no official API for that yet.

This was originally added for Mac, right? Were you trying it on Ampere for Windows?

Yep, and Linux. Unfortunately it happened before we added Ampere+Linux to our perf infra. For windows we wait till the official API is updated IsProcessorFeaturePresent

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ARM64] Consider using LDAPR to implement volatile reads when instruction is available.
6 participants