Re-enable atomic loads and stores for all RISC-V targets #98333

SimonSapin · 2022-06-21T10:40:59Z

This roughly reverts PR #66548

Atomic "CAS" are still disabled for targets without the “A” Standard Extension for Atomic Instructions. However this extension only adds instructions for operations more complex than simple loads and stores, which are always atomic when aligned.

In the Unprivileged Spec v. 20191213 section 2.6 Load and Store Instructions of chapter 2 RV32I Base Integer Instruction Set (emphasis mine):

Even when misaligned loads and stores complete successfully, these accesses might run extremely slowly depending on the implementation (e.g., when implemented via an invisible trap). Further-more, whereas naturally aligned loads and stores are guaranteed to execute atomically, misaligned loads and stores might not, and hence require additional synchronization to ensure atomicity.

Unfortunately PR #66548 did not provide much details on the bug that motivated it, but #66240 and #85736 appear related and happen with targets that do have the A extension.

This roughly reverts PR rust-lang#66548 Atomic "CAS" are still disabled for targets without the *“A” Standard Extension for Atomic Instructions*. However this extension only adds instructions for operations more complex than simple loads and stores, which are always atomic when aligned. In the [Unprivileged Spec v. 20191213](https://riscv.org/technical/specifications/) section 2.6 *Load and Store Instructions* of chapter 2 *RV32I Base Integer Instruction Set* (emphasis mine): > Even when misaligned loads and stores complete successfully, > these accesses might run extremely slowly depending on the implementation > (e.g., when implemented via an invisible trap). Further-more, whereas > **naturally aligned loads and stores are guaranteed to execute atomically**, > misaligned loads and stores might not, and hence require > additional synchronization to ensure atomicity. Unfortunately PR rust-lang#66548 did not provide much details on the bug that motivated it, but rust-lang#66240 and rust-lang#85736 appear related and happen with targets that do have the A extension.

rust-highfive · 2022-06-21T10:41:05Z

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive · 2022-06-21T10:41:06Z

⚠️ Warning ⚠️

These commits modify compiler targets. (See the Target Tier Policy.)

SimonSapin · 2022-06-21T10:57:00Z

#66240 and #85736 appear related and happen with targets that do have the A extension.

I don’t know if this is relevant or what LLVM does there, but the A extension has a limitation in that single-instruction atomic swap/add/and/or/xor/min/max only exists for 32-bit or (on RV64) 64-bit values. Similar operations on 8-bit or 16-bit values would have to be a loop based on 32-bit load-reserved and store-conditional. I think this is the kind of loop AtomicU8::fetch_update does?

MabezDev · 2022-06-21T11:12:30Z

I don't think changing the max_atomic_width is enough. The LLVM backend currently will expand load/stores to libcalls on targets without the +a extension. I'm not sure what codegen granularity we have in rustc, but it would require us to do some lowering ourselves before passing to LLVM to make sure there are appropriate fences in place. We would also need to guarentee this custom lowering never happens on targets with atomic_cas: true, just incase the CAS routines use a locking data structure to implement CAS.

More details in here: #81752 (comment).

jamesmunns · 2022-06-21T11:30:11Z

As an analogy, this should be the same status as armv5te and thumbv6m targets, where they do not naturally have any ability to perform CAS operations, but do have atomic load/stores of their native word width.

It seems as though this may be handled differently upstream in LLVM for RV32I targets, though that seems like something that should be addressed upstream for the sake of consistency.

SimonSapin · 2022-06-21T14:55:13Z

Trying this on a somewhat(?) simple project fails:

error: linking with `rust-lld` failed: exit status: 1
  |
  = note: "rust-lld" "-flavor" "gnu" "/tmp/rustcZzSyCr/symbols.o" […]
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by compiler_builtins.5931568b-cgu.83
          >>>               compiler_builtins-35b1260b00e1afde.compiler_builtins.5931568b-cgu.83.rcgu.o:(compiler_builtins::mem::memcpy::h4f55b8ec9004b8fa) in archive /home/simon/projects/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/riscv32i-unknown-none-elf/lib/libcompiler_builtins-35b1260b00e1afde.rlib

Which looks like #85736

MabezDev · 2022-06-21T15:25:41Z

I think that is the intended behaviour, even though we're setting the max_atomic_width, LLVM still won't generate the code because it knows the target doesn't have the A extension, instead, it defers to libcalls.

Along with this PR, we need to convince upstream LLVM that adding atomic load/stores for RISCVI targets is safe. The only way we can do that is if we can convey to LLVM that mixing true atomic load/stores with locking atomic CAS implementations is not possible. I think this is only possible under two conditions:

atomic_cas is set to false in the target definition

or

the cas implementation is known to be lock-free

The second seems quite difficult to prove, but the first one is easy. We just need a way of exposing this information to LLVM.

MabezDev · 2022-06-21T15:32:17Z

I suppose another option is to implement the libcalls in software, inside https://github.com/rust-lang/compiler-builtins, but that would require convincing them to change their current stance which is

Rust only exposes atomic types on platforms that support them and therefore does not need to fall back to software implementations.

We would also have to abide by the rules stated above (not mixing real atomics with locking atomic implementations) but perhaps this is easier to verify on the Rust side?

@Amanieu any thoughts on this?

SimonSapin · 2022-06-22T15:55:51Z

The fence instruction is part of the base RV32I ISA, and core::sync::atomic::fence seems to be available on all targets.

As a work around, this generates on both riscv32i-unknown-none-elf and riscv32imac-unknown-none-elf the same machine code as AtomicU8 does on riscv32imac-unknown-none-elf:

pub struct MyAtomicU8(UnsafeCell<u8>);

unsafe impl Sync for MyAtomicU8 {}

impl MyAtomicU8 {
    pub fn store(&self, value: u8, ordering: Ordering) {
        fence(ordering);
        unsafe { self.0.get().write(value) }
    }

    pub fn load(&self, ordering: Ordering) -> u8 {
        let value = unsafe { self.0.get().read() };
        fence(ordering);
        value
    }
}

Does it look correct?

taiki-e · 2022-06-22T18:19:19Z

As a work around, this generates on both riscv32i-unknown-none-elf and riscv32imac-unknown-none-elf the same machine code as AtomicU8 does on riscv32imac-unknown-none-elf:

Does it look correct?

It does not look correct:

ptr::{write,read} are not atomic operations. core::ptr docs says:

All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory. Notice that this explicitly includes read_volatile and write_volatile: Volatile accesses cannot be used for inter-thread synchronization.
~~IIUC, atomic fences are guaranteed to work only for atomic operations. The compiler can probably reorder non-atomic operations.~~ EDIT: this does not seem correct

AFAIK, the only sound way with pure Rust currently available is to use inline assembly like my portable-atomic crate does.

(In this particular case, volatile read/write + fence probably work too, but is also not sound because volatile read/write are not guaranteed to be atomic. Actualy, it seems LLVM uses instructions that are not guaranteed to be atomic in aarch64's volatile read/write.)

SimonSapin · 2022-06-22T18:49:51Z

This is very helpful, thanks @taiki-e!

Amanieu · 2022-06-23T00:55:10Z

Overall I'm in favor of this change, but it requires either one of the following to happen:

LLVM stops emitting libcalls for RISC-V atomic load/store.
We implement __atomic_load/__atomic_store in compiler-builtins.

I'm happy with either approach.

IIUC, atomic fences are guaranteed to work only for atomic operations. The compiler can probably reorder non-atomic operations. core::sync::atomic::fence docs says:

No, atomic fences also enforce ordering on non-atomic operations.

MabezDev · 2022-06-23T09:13:57Z

Thanks for weighing in @Amanieu! The LLVM changes are a bit out of my depth, but I could PR some __atomic_load/__atomic_store implementations for RISCV in compiler-builtins?

To expand on my earlier comment about mixing lock-free load/stores with locking cas implementations, see this LLVM discussion thread: https://reviews.llvm.org/D47553. How do you think we should avoid that? Only provide libcall implementations if atomic_cas: false? Is there a compiler cfg for this? Maybe not worrying about this at all is an option, thats what GCC does...

SimonSapin · 2022-06-23T10:49:42Z

Looks like libcore uses cfg(target_has_atomic = "8") as "has CAS atomics for size 8". (cfg(target_has_atomic_load_store = "8") is what makes the Atomic* types exist at all.)

Amanieu · 2022-06-23T14:00:37Z

To expand on my earlier comment about mixing lock-free load/stores with locking cas implementations, see this LLVM discussion thread: reviews.llvm.org/D47553. How do you think we should avoid that? Only provide libcall implementations if atomic_cas: false? Is there a compiler cfg for this? Maybe not worrying about this at all is an option, thats what GCC does...

The fundamental issue is summary in the last comment here: this is unsound when mixing locked-based CAS operations with lock-free load/store operations. While this doesn't apply to Rust since we don't expose CAS operations on this target, we still shouldn't override the __atomic_* symbols since those may be used by C code.

A better solution might be to explicitly lower atomic load/store to a volatile load/store + a fence in rustc or in the standard library.

nagisa · 2022-07-23T12:33:25Z

r? @Amanieu

Amanieu · 2022-07-23T16:15:41Z

LLVM seems to be moving away from supporting atomic load/store independently of full atomic support. We need to make a decision of what to support in Rust: #99595 (comment)

jamesmunns · 2022-07-24T10:32:05Z

LLVM seems to be moving away from supporting atomic load/store independently of full atomic support. We need to make a decision of what to support in Rust: #99595 (comment)

Hey @Amanieu, what would be the best way to open up a meta issue and get the embedded-wg involved?

It seems like this is a change to llvm, and it might impact many embedded targets, potentially in a breaking way if atomics with just load/stores are removed for stable targets.

bors · 2023-08-05T00:37:05Z

📌 Commit 20bd0c3 has been approved by Amanieu

It is now in the queue for this repository.

bors · 2023-08-05T01:53:42Z

⌛ Testing commit 20bd0c3 with merge 90f0b24...

bors · 2023-08-05T03:43:00Z

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 90f0b24 to master...

taiki-e · 2023-08-05T03:43:32Z

@Amanieu I believe we need to also enable forced-atomics target feature for these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

And, forced-atomics target feature itself is currently broken due to another bug: #114153

…anieu" This reverts commit 90f0b24, reversing changes made to e173a8e.

rust-timer · 2023-08-05T06:09:31Z

Finished benchmarking commit (90f0b24): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.4%, 1.3%]	18
Regressions ❌ (secondary)	0.6%	[0.2%, 0.9%]	12
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.8%	[0.4%, 1.3%]	18

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-1.8%, -1.2%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-1.8%, -1.2%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 650.501s -> 650.303s (-0.03%)

lqd · 2023-08-05T08:02:17Z

While this PR may be reverted, the performance results are noise, since it only modified RISC-V target specs, so: @rustbot label: +perf-regression-triaged

taiki-e · 2023-08-06T02:45:00Z

Ok, I can confirm this PR break the build for these targets with "undefined symbol" error.

rust-lang/rust#98333 broke RISC-V targets without A-extension. This will be fixed by rust-lang/rust#114497 or rust-lang/rust#114499. ``` = note: rust-lld: error: undefined symbol: __atomic_load_4 >>> referenced by mod.rs:1242 (/rustc/eb088b8b9d98f1af1b0e61bbdcd8686e1b0db7b6/library/core/src/num/mod.rs:1242) >>> compiler_builtins-d066fd6ed508b6b5.compiler_builtins.b1b28d926042a9f7-cgu.004.rcgu.o:(compiler_builtins::mem::memcpy::he6d5500b219c1d3d) in archive /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/riscv32i-unknown-none-elf/lib/libcompiler_builtins-d066fd6ed508b6b5.rlib ```

rust-lang/rust#98333 broke RISC-V targets without A-extension. This will be fixed by rust-lang/rust#114497 or rust-lang/rust#114499. ``` = note: rust-lld: error: undefined symbol: __atomic_load_4 >>> referenced by uint_macros.rs:1230 (/home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs:1230) >>> compiler_builtins-a15f77f0f647aa99.compiler_builtins.eedcbccd0d1b9b88-cgu.1.rcgu.o:(compiler_builtins::mem::memcpy::hedd00e0c59d2a943) in archive /home/runner/work/portable-atomic/portable-atomic/target/riscv32im-unknown-none-elf/debug/deps/libcompiler_builtins-a15f77f0f647aa99.rlib ```

…nieu Revert rust-lang#98333 "Re-enable atomic loads and stores for all RISC-V targets" This reverts rust-lang#98333. As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd However, `forced-atomics` target feature is currently broken (rust-lang#114153), so AFAIK, there is currently no way to enable atomic load/store (via core::intrinsics) on these targets properly. r? `@Amanieu`

…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#114376 (Avoid exporting __rust_alloc_error_handler_should_panic more than once.) - rust-lang#114413 (Warn when #[macro_export] is applied on decl macros) - rust-lang#114497 (Revert rust-lang#98333 "Re-enable atomic loads and stores for all RISC-V targets") - rust-lang#114500 (Remove arm crypto target feature) - rust-lang#114566 (Store the laziness of type aliases in their `DefKind`) - rust-lang#114594 (Structurally normalize weak and inherent in new solver) - rust-lang#114596 (Rename method in `opt-dist`) r? `@ghost` `@rustbot` modify labels: rollup

Mark-Simulacrum · 2023-08-08T13:37:02Z

While this PR may be reverted, the performance results are noise, since it only modified RISC-V target specs, so:
@rustbot label: +perf-regression-triaged

Agreed with changing the label, just also wanted to reference #114481 -- my sense is the noise here is quite similar to that PRs.

Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd ~~This PR is currently marked as a draft because:~~ - ~~`forced-atomics` target feature is currently broken (rust-lang#114153 EDIT: Fixed - ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16. r? `@Amanieu`

Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf As said in rust-lang/rust#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd ~~This PR is currently marked as a draft because:~~ - ~~`forced-atomics` target feature is currently broken (rust-lang/rust#114153 EDIT: Fixed - ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16. r? `@Amanieu`

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 21, 2022

rust-highfive assigned nagisa Jun 21, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 21, 2022

SimonSapin added A-target-specs Area: Compile-target specifications O-riscv Target: RISC-V architecture labels Jun 21, 2022

SimonSapin marked this pull request as draft June 21, 2022 14:49

SimonSapin mentioned this pull request Jun 25, 2022

Add boot_argument(), returns the value an a* register had at boot time rust-embedded/riscv-rt#94

Closed

nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2022

rust-highfive assigned Amanieu and unassigned nagisa Jul 23, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 5, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 5, 2023

bors merged commit 90f0b24 into rust-lang:master Aug 5, 2023

rustbot added this to the 1.73.0 milestone Aug 5, 2023

taiki-e added a commit to taiki-e/rust that referenced this pull request Aug 5, 2023

Revert "Auto merge of rust-lang#98333 - SimonSapin:riscv-atomic, r=Am…

b47e4a4

…anieu" This reverts commit 90f0b24, reversing changes made to e173a8e.

This was referenced Aug 5, 2023

Revert #98333 "Re-enable atomic loads and stores for all RISC-V targets" #114497

Merged

Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf #114499

Merged

rustbot added the perf-regression Performance regression. label Aug 5, 2023

rustbot added the perf-regression-triaged The performance regression has been triaged. label Aug 5, 2023

lqd mentioned this pull request Aug 5, 2023

Rollup of 9 pull requests #114481

Merged

matthiaskrgr mentioned this pull request Aug 8, 2023

Rollup of 7 pull requests #114604

Merged

Re-enable atomic loads and stores for all RISC-V targets #98333

Re-enable atomic loads and stores for all RISC-V targets #98333

Uh oh!

Conversation

SimonSapin commented Jun 21, 2022

Uh oh!

rust-highfive commented Jun 21, 2022

Uh oh!

rust-highfive commented Jun 21, 2022

Uh oh!

SimonSapin commented Jun 21, 2022

Uh oh!

MabezDev commented Jun 21, 2022

Uh oh!

jamesmunns commented Jun 21, 2022

Uh oh!

SimonSapin commented Jun 21, 2022

Uh oh!

MabezDev commented Jun 21, 2022

Uh oh!

MabezDev commented Jun 21, 2022

Uh oh!

SimonSapin commented Jun 22, 2022

Uh oh!

taiki-e commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimonSapin commented Jun 22, 2022

Uh oh!

Amanieu commented Jun 23, 2022

Uh oh!

MabezDev commented Jun 23, 2022

Uh oh!

SimonSapin commented Jun 23, 2022

Uh oh!

Amanieu commented Jun 23, 2022

Uh oh!

nagisa commented Jul 23, 2022

Uh oh!

Amanieu commented Jul 23, 2022

Uh oh!

jamesmunns commented Jul 24, 2022

Uh oh!

bors commented Aug 5, 2023

Uh oh!

bors commented Aug 5, 2023

Uh oh!

bors commented Aug 5, 2023

Uh oh!

taiki-e commented Aug 5, 2023

Uh oh!

rust-timer commented Aug 5, 2023

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

lqd commented Aug 5, 2023

Uh oh!

taiki-e commented Aug 6, 2023

Uh oh!

Mark-Simulacrum commented Aug 8, 2023

Uh oh!

Uh oh!

taiki-e commented Jun 22, 2022 •

edited

Loading