Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable atomic loads and stores for all RISC-V targets #98333

Merged
merged 1 commit into from
Aug 5, 2023

Conversation

SimonSapin
Copy link
Contributor

This roughly reverts PR #66548

Atomic "CAS" are still disabled for targets without the “A” Standard Extension for Atomic Instructions. However this extension only adds instructions for operations more complex than simple loads and stores, which are always atomic when aligned.

In the Unprivileged Spec v. 20191213 section 2.6 Load and Store Instructions of chapter 2 RV32I Base Integer Instruction Set (emphasis mine):

Even when misaligned loads and stores complete successfully, these accesses might run extremely slowly depending on the implementation (e.g., when implemented via an invisible trap). Further-more, whereas naturally aligned loads and stores are guaranteed to execute atomically, misaligned loads and stores might not, and hence require additional synchronization to ensure atomicity.

Unfortunately PR #66548 did not provide much details on the bug that motivated it, but #66240 and #85736 appear related and happen with targets that do have the A extension.

This roughly reverts PR rust-lang#66548

Atomic "CAS" are still disabled for targets without the
*“A” Standard Extension for Atomic Instructions*.
However this extension only adds instructions for operations more complex
than simple loads and stores, which are always atomic when aligned.

In the [Unprivileged Spec v. 20191213](https://riscv.org/technical/specifications/)
section 2.6 *Load and Store Instructions* of
chapter 2 *RV32I Base Integer Instruction Set* (emphasis mine):

> Even when misaligned loads and stores complete successfully,
> these accesses might run extremely slowly depending on the implementation
> (e.g., when implemented via an invisible trap). Further-more, whereas
> **naturally aligned loads and stores are guaranteed to execute atomically**,
> misaligned loads and stores might not, and hence require
> additional synchronization to ensure atomicity.

Unfortunately PR rust-lang#66548 did not provide
much details on the bug that motivated it, but
rust-lang#66240 and
rust-lang#85736 appear related
and happen with targets that do have the A extension.
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 21, 2022
@rust-highfive
Copy link
Collaborator

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive
Copy link
Collaborator

⚠️ Warning ⚠️

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 21, 2022
@SimonSapin SimonSapin added A-target-specs Area: compile-target specifications O-riscv Target: RISC-V architecture labels Jun 21, 2022
@SimonSapin
Copy link
Contributor Author

#66240 and #85736 appear related and happen with targets that do have the A extension.

I don’t know if this is relevant or what LLVM does there, but the A extension has a limitation in that single-instruction atomic swap/add/and/or/xor/min/max only exists for 32-bit or (on RV64) 64-bit values. Similar operations on 8-bit or 16-bit values would have to be a loop based on 32-bit load-reserved and store-conditional. I think this is the kind of loop AtomicU8::fetch_update does?

@MabezDev
Copy link
Contributor

I don't think changing the max_atomic_width is enough. The LLVM backend currently will expand load/stores to libcalls on targets without the +a extension. I'm not sure what codegen granularity we have in rustc, but it would require us to do some lowering ourselves before passing to LLVM to make sure there are appropriate fences in place. We would also need to guarentee this custom lowering never happens on targets with atomic_cas: true, just incase the CAS routines use a locking data structure to implement CAS.

More details in here: #81752 (comment).

@jamesmunns
Copy link
Member

As an analogy, this should be the same status as armv5te and thumbv6m targets, where they do not naturally have any ability to perform CAS operations, but do have atomic load/stores of their native word width.

It seems as though this may be handled differently upstream in LLVM for RV32I targets, though that seems like something that should be addressed upstream for the sake of consistency.

@SimonSapin SimonSapin marked this pull request as draft June 21, 2022 14:49
@SimonSapin
Copy link
Contributor Author

Trying this on a somewhat(?) simple project fails:

error: linking with `rust-lld` failed: exit status: 1
  |
  = note: "rust-lld" "-flavor" "gnu" "/tmp/rustcZzSyCr/symbols.o" []
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by compiler_builtins.5931568b-cgu.83
          >>>               compiler_builtins-35b1260b00e1afde.compiler_builtins.5931568b-cgu.83.rcgu.o:(compiler_builtins::mem::memcpy::h4f55b8ec9004b8fa) in archive /home/simon/projects/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/riscv32i-unknown-none-elf/lib/libcompiler_builtins-35b1260b00e1afde.rlib

Which looks like #85736

@MabezDev
Copy link
Contributor

I think that is the intended behaviour, even though we're setting the max_atomic_width, LLVM still won't generate the code because it knows the target doesn't have the A extension, instead, it defers to libcalls.

Along with this PR, we need to convince upstream LLVM that adding atomic load/stores for RISCVI targets is safe. The only way we can do that is if we can convey to LLVM that mixing true atomic load/stores with locking atomic CAS implementations is not possible. I think this is only possible under two conditions:

atomic_cas is set to false in the target definition

or

the cas implementation is known to be lock-free

The second seems quite difficult to prove, but the first one is easy. We just need a way of exposing this information to LLVM.

@MabezDev
Copy link
Contributor

I suppose another option is to implement the libcalls in software, inside https://github.com/rust-lang/compiler-builtins, but that would require convincing them to change their current stance which is

Rust only exposes atomic types on platforms that support them and therefore does not need to fall back to software implementations.

We would also have to abide by the rules stated above (not mixing real atomics with locking atomic implementations) but perhaps this is easier to verify on the Rust side?

@Amanieu any thoughts on this?

@SimonSapin
Copy link
Contributor Author

The fence instruction is part of the base RV32I ISA, and core::sync::atomic::fence seems to be available on all targets.

As a work around, this generates on both riscv32i-unknown-none-elf and riscv32imac-unknown-none-elf the same machine code as AtomicU8 does on riscv32imac-unknown-none-elf:

pub struct MyAtomicU8(UnsafeCell<u8>);

unsafe impl Sync for MyAtomicU8 {}

impl MyAtomicU8 {
    pub fn store(&self, value: u8, ordering: Ordering) {
        fence(ordering);
        unsafe { self.0.get().write(value) }
    }

    pub fn load(&self, ordering: Ordering) -> u8 {
        let value = unsafe { self.0.get().read() };
        fence(ordering);
        value
    }
}

Does it look correct?

@taiki-e
Copy link
Member

taiki-e commented Jun 22, 2022

As a work around, this generates on both riscv32i-unknown-none-elf and riscv32imac-unknown-none-elf the same machine code as AtomicU8 does on riscv32imac-unknown-none-elf:

Does it look correct?

It does not look correct:

  • ptr::{write,read} are not atomic operations. core::ptr docs says:

    All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory. Notice that this explicitly includes read_volatile and write_volatile: Volatile accesses cannot be used for inter-thread synchronization.

  • IIUC, atomic fences are guaranteed to work only for atomic operations. The compiler can probably reorder non-atomic operations. EDIT: this does not seem correct

AFAIK, the only sound way with pure Rust currently available is to use inline assembly like my portable-atomic crate does.

(In this particular case, volatile read/write + fence probably work too, but is also not sound because volatile read/write are not guaranteed to be atomic. Actualy, it seems LLVM uses instructions that are not guaranteed to be atomic in aarch64's volatile read/write.)

@SimonSapin
Copy link
Contributor Author

This is very helpful, thanks @taiki-e!

@Amanieu
Copy link
Member

Amanieu commented Jun 23, 2022

Overall I'm in favor of this change, but it requires either one of the following to happen:

  • LLVM stops emitting libcalls for RISC-V atomic load/store.
  • We implement __atomic_load/__atomic_store in compiler-builtins.

I'm happy with either approach.

  • IIUC, atomic fences are guaranteed to work only for atomic operations. The compiler can probably reorder non-atomic operations. core::sync::atomic::fence docs says:

No, atomic fences also enforce ordering on non-atomic operations.

@MabezDev
Copy link
Contributor

Thanks for weighing in @Amanieu! The LLVM changes are a bit out of my depth, but I could PR some __atomic_load/__atomic_store implementations for RISCV in compiler-builtins?

To expand on my earlier comment about mixing lock-free load/stores with locking cas implementations, see this LLVM discussion thread: https://reviews.llvm.org/D47553. How do you think we should avoid that? Only provide libcall implementations if atomic_cas: false? Is there a compiler cfg for this? Maybe not worrying about this at all is an option, thats what GCC does...

@SimonSapin
Copy link
Contributor Author

Looks like libcore uses cfg(target_has_atomic = "8") as "has CAS atomics for size 8". (cfg(target_has_atomic_load_store = "8") is what makes the Atomic* types exist at all.)

@Amanieu
Copy link
Member

Amanieu commented Jun 23, 2022

To expand on my earlier comment about mixing lock-free load/stores with locking cas implementations, see this LLVM discussion thread: reviews.llvm.org/D47553. How do you think we should avoid that? Only provide libcall implementations if atomic_cas: false? Is there a compiler cfg for this? Maybe not worrying about this at all is an option, thats what GCC does...

The fundamental issue is summary in the last comment here: this is unsound when mixing locked-based CAS operations with lock-free load/store operations. While this doesn't apply to Rust since we don't expose CAS operations on this target, we still shouldn't override the __atomic_* symbols since those may be used by C code.

A better solution might be to explicitly lower atomic load/store to a volatile load/store + a fence in rustc or in the standard library.

@nagisa nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2022
@nagisa
Copy link
Member

nagisa commented Jul 23, 2022

r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned nagisa Jul 23, 2022
@Amanieu
Copy link
Member

Amanieu commented Jul 23, 2022

LLVM seems to be moving away from supporting atomic load/store independently of full atomic support. We need to make a decision of what to support in Rust: #99595 (comment)

@jamesmunns
Copy link
Member

LLVM seems to be moving away from supporting atomic load/store independently of full atomic support. We need to make a decision of what to support in Rust: #99595 (comment)

Hey @Amanieu, what would be the best way to open up a meta issue and get the embedded-wg involved?

It seems like this is a change to llvm, and it might impact many embedded targets, potentially in a breaking way if atomics with just load/stores are removed for stable targets.

@bors
Copy link
Contributor

bors commented Aug 5, 2023

📌 Commit 20bd0c3 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 5, 2023
@bors
Copy link
Contributor

bors commented Aug 5, 2023

⌛ Testing commit 20bd0c3 with merge 90f0b24...

@bors
Copy link
Contributor

bors commented Aug 5, 2023

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 90f0b24 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 5, 2023
@taiki-e
Copy link
Member

taiki-e commented Aug 5, 2023

@Amanieu I believe we need to also enable forced-atomics target feature for these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

And, forced-atomics target feature itself is currently broken due to another bug: #114153

@bors bors merged commit 90f0b24 into rust-lang:master Aug 5, 2023
@rustbot rustbot added this to the 1.73.0 milestone Aug 5, 2023
taiki-e added a commit to taiki-e/rust that referenced this pull request Aug 5, 2023
…anieu"

This reverts commit 90f0b24, reversing
changes made to e173a8e.
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (90f0b24): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.4%, 1.3%] 18
Regressions ❌
(secondary)
0.6% [0.2%, 0.9%] 12
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.8% [0.4%, 1.3%] 18

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-1.8%, -1.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-1.8%, -1.2%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 650.501s -> 650.303s (-0.03%)

@rustbot rustbot added the perf-regression Performance regression. label Aug 5, 2023
@lqd
Copy link
Member

lqd commented Aug 5, 2023

While this PR may be reverted, the performance results are noise, since it only modified RISC-V target specs, so: @rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Aug 5, 2023
@lqd lqd mentioned this pull request Aug 5, 2023
@taiki-e
Copy link
Member

taiki-e commented Aug 6, 2023

taiki-e added a commit to taiki-e/semihosting that referenced this pull request Aug 6, 2023
rust-lang/rust#98333 broke RISC-V targets
without A-extension.
This will be fixed by rust-lang/rust#114497 or
rust-lang/rust#114499.

```
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by mod.rs:1242 (/rustc/eb088b8b9d98f1af1b0e61bbdcd8686e1b0db7b6/library/core/src/num/mod.rs:1242)
          >>>               compiler_builtins-d066fd6ed508b6b5.compiler_builtins.b1b28d926042a9f7-cgu.004.rcgu.o:(compiler_builtins::mem::memcpy::he6d5500b219c1d3d) in archive /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/riscv32i-unknown-none-elf/lib/libcompiler_builtins-d066fd6ed508b6b5.rlib
```
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request Aug 6, 2023
rust-lang/rust#98333 broke RISC-V targets
without A-extension.
This will be fixed by rust-lang/rust#114497 or
rust-lang/rust#114499.

```
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by mod.rs:1242 (/rustc/eb088b8b9d98f1af1b0e61bbdcd8686e1b0db7b6/library/core/src/num/mod.rs:1242)
          >>>               compiler_builtins-d066fd6ed508b6b5.compiler_builtins.b1b28d926042a9f7-cgu.004.rcgu.o:(compiler_builtins::mem::memcpy::he6d5500b219c1d3d) in archive /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/riscv32i-unknown-none-elf/lib/libcompiler_builtins-d066fd6ed508b6b5.rlib
```
taiki-e added a commit to taiki-e/portable-atomic that referenced this pull request Aug 6, 2023
rust-lang/rust#98333 broke RISC-V targets
without A-extension.
This will be fixed by rust-lang/rust#114497 or
rust-lang/rust#114499.

```
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by uint_macros.rs:1230 (/home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs:1230)
          >>>               compiler_builtins-a15f77f0f647aa99.compiler_builtins.eedcbccd0d1b9b88-cgu.1.rcgu.o:(compiler_builtins::mem::memcpy::hedd00e0c59d2a943) in archive /home/runner/work/portable-atomic/portable-atomic/target/riscv32im-unknown-none-elf/debug/deps/libcompiler_builtins-a15f77f0f647aa99.rlib
```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2023
…nieu

Revert rust-lang#98333 "Re-enable atomic loads and stores for all RISC-V targets"

This reverts rust-lang#98333.

As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

However, `forced-atomics` target feature is currently broken (rust-lang#114153), so AFAIK, there is currently no way to enable atomic load/store (via core::intrinsics) on these targets properly.

r? `@Amanieu`
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 8, 2023
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#114376 (Avoid exporting __rust_alloc_error_handler_should_panic more than once.)
 - rust-lang#114413 (Warn when #[macro_export] is applied on decl macros)
 - rust-lang#114497 (Revert rust-lang#98333 "Re-enable atomic loads and stores for all RISC-V targets")
 - rust-lang#114500 (Remove arm crypto target feature)
 - rust-lang#114566 (Store the laziness of type aliases in their `DefKind`)
 - rust-lang#114594 (Structurally normalize weak and inherent in new solver)
 - rust-lang#114596 (Rename method in `opt-dist`)

r? `@ghost`
`@rustbot` modify labels: rollup
@Mark-Simulacrum
Copy link
Member

While this PR may be reverted, the performance results are noise, since it only modified RISC-V target specs, so:
@rustbot label: +perf-regression-triaged

Agreed with changing the label, just also wanted to reference #114481 -- my sense is the noise here is quite similar to that PRs.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2023
Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf

As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

~~This PR is currently marked as a draft because:~~
- ~~`forced-atomics` target feature is currently broken (rust-lang#114153 EDIT: Fixed
- ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped

Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16.

r? `@Amanieu`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2023
Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf

As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

~~This PR is currently marked as a draft because:~~
- ~~`forced-atomics` target feature is currently broken (rust-lang#114153 EDIT: Fixed
- ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped

Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16.

r? `@Amanieu`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 30, 2023
Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf

As said in rust-lang#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

~~This PR is currently marked as a draft because:~~
- ~~`forced-atomics` target feature is currently broken (rust-lang#114153 EDIT: Fixed
- ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped

Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16.

r? `@Amanieu`
bors added a commit to rust-lang/miri that referenced this pull request Nov 30, 2023
Pass +forced-atomics feature for riscv32{i,im,imc}-unknown-none-elf

As said in rust-lang/rust#98333 (comment), `forced-atomics` target feature is also needed to enable atomic load/store on these targets (otherwise, libcalls are generated): https://godbolt.org/z/433qeG7vd

~~This PR is currently marked as a draft because:~~
- ~~`forced-atomics` target feature is currently broken (rust-lang/rust#114153 EDIT: Fixed
- ~~`forced-atomics` target feature has been added in LLVM 16 (llvm/llvm-project@f5ed0cb), but the current minimum LLVM version [is 15](https://github.com/rust-lang/rust/blob/90f0b24ad3e7fc0dc0e419c9da30d74629cd5736/src/bootstrap/llvm.rs#L557). In LLVM 15, the atomic load/store of these targets generates libcalls anyway.~~ EDIT: LLVM 15 has been dropped

Depending on the policy on the minimum LLVM version for these targets, this may be blocked until the minimum LLVM version is increased to 16.

r? `@Amanieu`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: atomics, barriers, and sync primitives A-target-specs Area: compile-target specifications merged-by-bors This PR was explicitly merged by bors. O-riscv Target: RISC-V architecture perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.