-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[arm64] Add RCPC ISA (8.3+) and use ldap for volatile reads #67384
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR adds a new ISA RCPC ( Closes #67374
|
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
@VSadov do you have a benchmark/scenario in mind to see improvements from this change? |
@EgorBo This can have effect on scenarios that mix volatile writes and reads. Like writing/reading to ConcurrentQueue in a loop. The new instruction is not necessarily cheaper by itself - scenarios just doing lots of volatile reads may not be affected. There need to be some volatile writes (or reference writes to heap locations) in the mix as LDAR needs to consider preceding STLR while LDAPR does not. Also note that in order to see gains, the hardware should take advantage of relaxed semantics. Some early implementations of LDAPR could be just aliases of LDAR. |
I wasn't able to reproduce improvements on M1 so probably it is indeed is just a renamed @dotnet/jit-contrib PTAL, no diffs but in fact all volatile loads were changed from ldar[b/h] to ldapr[b/h] |
NOTE: Unfortunately, this PR doesn't detect RCPC feature set on Windows as there is no official API for that yet. |
This was originally added for Mac, right? Were you trying it on Ampere for Windows? |
Yep, and Linux. Unfortunately it happened before we added Ampere+Linux to our perf infra. For windows we wait till the official API is updated |
This PR adds a new ISA RCPC (
Release Consistent Processor Consistent support
) for arm64-v8.3+ (optionally available on arm64-v8.2) in order to rely onldapr/b/h
for volatile reads with acquire/release semantics, see #67374Apple M1 seems support it but most likely it's just an alias for
ldar
there so no boost.Closes #67374
codegen diff