ESP-IDF targets' Atomic*64 is not lock-free #117305

taiki-e · 2023-10-28T04:57:54Z

max_atomic_width of these targets is currently set to 64

rust/compiler/rustc_target/src/spec/riscv32imac_esp_espidf.rs

Line 20 in 0237aa3

max_atomic_width: Some(64),

rust/compiler/rustc_target/src/spec/riscv32imc_esp_espidf.rs

Line 23 in 0237aa3

max_atomic_width: Some(64),

However, as mentioned in #115577 (comment), esp-idf's 64-bit atomic builtins use global spinlocks on multi-core systems:
https://github.com/espressif/esp-idf/blob/3b748a6cb76c2db7c6368a0dea32a88bc58bc44d/components/newlib/stdatomic.c#L64

And it violates what the standard library is intended to guarantee.

https://doc.rust-lang.org/nightly/std/sync/atomic/index.html#portability

All atomic types in this module are guaranteed to be lock-free if they’re available. This means they don’t internally acquire a global mutex.

Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32.

cc @ivmarkov @MabezDev (mentioned because you both are target maintainers)

@rustbot label +A-atomic +I-unsound

apiraino · 2023-10-30T11:18:15Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

Set max_atomic_width for riscv32*-esp-espidf to 32 Fixes rust-lang#117305 > Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32. This is a breaking change because Atomic\*64 will become unavailable, but all affected targets are tier 3, and the current Atomic*64 violates the standard library's API contract and can cause problems with code that rely on the standard library's atomic types being lock-free. r? `@Amanieu` cc `@ivmarkov` `@MabezDev`

ivmarkov · 2024-08-31T11:41:06Z

max_atomic_width of these targets is currently set to 64

rust/compiler/rustc_target/src/spec/riscv32imac_esp_espidf.rs

Line 20 in 0237aa3

max_atomic_width: Some(64),

rust/compiler/rustc_target/src/spec/riscv32imc_esp_espidf.rs

Line 23 in 0237aa3

max_atomic_width: Some(64),

However, as mentioned in #115577 (comment), esp-idf's 64-bit atomic builtins use global spinlocks on multi-core systems: https://github.com/espressif/esp-idf/blob/3b748a6cb76c2db7c6368a0dea32a88bc58bc44d/components/newlib/stdatomic.c#L64

And it violates what the standard library is intended to guarantee.

https://doc.rust-lang.org/nightly/std/sync/atomic/index.html#portability

All atomic types in this module are guaranteed to be lock-free if they’re available. This means they don’t internally acquire a global mutex.

Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32.

cc @ivmarkov @MabezDev (mentioned because you both are target maintainers)

@rustbot label +A-atomic +I-unsound

@taiki-e

Sorry to revive this old issue. At the time I just trusted your judgement when you removed the AtomicU64 implementation from the riscv32imac-esp-espidf as my understanding of this subject mater was (and still is) limited.

The problem is, we now we have this new issue - and - without diving into the details of the new issue just yet - we (or I) really need to understand what exactly a "lock-free" atomic does mean and how exactly an atomic that is allowed to "wait" is different from a "lockful" atomic (all atomics in core::sync::atomic are allowed to wait but are NOT allowed to use locks as per the documentation).

To give you a concrete example where I struggle: for the riscv32imac-esp-espidf target, you removed the support for AtomicU64 on the basis that it uses a global spinlock (in the presence of multiple cores).

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?
Because this spinlock (oversimplifying a bit here) is nothing else than a loop. A "wait", if you wish.

The spinlock implementation is essentially (in pseudocode):

static hardware_atomic_bool_spinlock; // A `bool` **hardware** atomic declared statically somewhere

fn software_cmpxchg_for_something_big(something: *mut SomethingBig, old: SomethingBig, new: SomethingBig) -> SomethingBig {
    disable_all_interrupts_for_the_core_that_runs_this_code();

    while hardware_cmpxchg(&hardware_atomic_bool_spinlock, false, true) == true {
        // Ah, the spinlock is set to true; must be the other core; re-try in a busy-loop until we succeed
    }

    // Here we have interrupts disabled for our core, _and_ we have indicated to the other core (via the `hardware_atomic_bool_spinlock`) to wait until we do our job so we can just manipulate `something` now
    let current = unsafe { *something };
    if current == old {
        unsafe { *something = new; }
    }

    let ret = hardware_cmpxchg(&hardware_atomic_bool_spinlock, true, false);
    assert!(ret); // Should always return `true`

    enable_all_interrupts_for_the_core_that_runs_this_code();

    ret
}

So why is the above ^^^ a "lock" and not just a "wait"? And could you give an example of a "wait" which is not a "lock"?

bjorn3 · 2024-08-31T12:51:16Z

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?

To quote wikipedia

Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.[15] This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high.
[...]
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free.

Your software_cmpxchg_for_something_big would starve all threads if any thread that calls it permanently gets stopped while the lock on hardware_atomic_bool_spinlock is held. CPU's generally have NMI's which bypass disable_all_interrupts_for_the_core_that_runs_this_code();, so if the NMI handler gets stuck (which it is allowed to, eg because it depends on hardware_atomic_bool_spinlock itself) or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

A lock-free but non-wait-free implementation allows some threads get stuck, but still requires that at least one thread can continue at some point. Only a non-lock-free implementation is allowed to have all threads that attempt to perform atomic operations to get permanently stuck.

ivmarkov · 2024-08-31T13:00:18Z

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?

To quote wikipedia

Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.[15] This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high.
[...]
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free.

Your software_cmpxchg_for_something_big would starve all threads if any thread that calls it permanently gets stopped while the lock on hardware_atomic_bool_spinlock is held. CPU's generally have NMI's which bypass disable_all_interrupts_for_the_core_that_runs_this_code();, so if the NMI handler gets stuck (which it is allowed to, eg because it depends on hardware_atomic_bool_spinlock itself) or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

A lock-free but non-wait-free implementation allows some threads get stuck, but still requires that at least one thread can continue at some point. Only a non-lock-free implementation is allowed to have all threads that attempt to perform atomic operations to get permanently stuck.

So are you saying that

In the presence of NMI the above code is NOT lock free
However in the absence of NMI, the above code would be lock free? (it is another topic if we can guarantee the absence of NMI or its weaker variant - that the NMI code never touches hardware_atomic_bool_spinlock)?

ivmarkov · 2024-08-31T13:05:56Z

As for

or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

Wouldn't that mean the whole "user-space" (non-interrupt code) = all threads would starve, and nothing would make progress anyway. Including threads that utilize native atomics only. Asking, because I don't see how that particular example (NMIs firing back to back) applies here, as it would (obviously?) starve absolutely everything else, hardware atomics or not?

bjorn3 · 2024-08-31T13:12:29Z

However in the absence of NMI, the above code would be lock free?

I think it would be assuming that the underlying CPU core ensures that it will always keep making forward progress itself. So for example another core must not be able to shutdown any core that has hardware_atomic_bool_spinlock held.

(it is another topic if we can guarantee the absence of NMI or its weaker variant - that the NMI code never touches hardware_atomic_bool_spinlock)?

Even an NMI handler which immediately returns would make it non-lock-free if there is a possibility that NMI's arrive back to back. If however the CPU guarantees to execute at least one instruction between returning from an NMI and calling the NMI handler again (eg because the only source of NMI's is a watchdog timer and the NMI handler is fast enough), then it should be fine when the NMI handler doesn't touch hardware_atomic_bool_spinlock.

Wouldn't that mean the whole "user-space" (non-interrupt code) = all threads would starve, and nothing would make progress anyway. Including threads that utilize native atomics only. Asking, because I don't see how that particular example (NMIs firing back to back) applies here, as it would (obviously?) starve absolutely everything else, hardware atomics or not?

With lock-free atomics if you suspend all userspace threads except for one, that one thread will always be able to make forward progress with regards to executing atomic operations. In case of a non-lock-free implementation it is possible that it blocks forever waiting on one of the suspended threads releasing the lock.

ivmarkov · 2024-08-31T14:23:28Z

I think I might understand it a bit now.
If you could confirm my two examples/statements below that would be greatly appreciated (thank you for bearing with me!):

=====================

Example why spinlock+disable-all-interrupts-except-nmi is not lock free in the presence of any NMI:

Two cores, core0 and core1
Two threads, thread0 - pinned to core0, and thread1 - pinned to core1, both currently inside software_cmpxchg_for_something_big, but thread0 also got the "lock" and is in the midst of mutating something, while thread1 still busyloops on the spinlock
Suddenly thread0 gets suspended while still mutating something because core0 suddenly receives a storm of never-ending empty NMIs and therefore thread0 cannot progress
=> ...but that's OK, because it is "suspended" by the NMI storm
thread1 - unfortunately - cannot progress either, as it is stuck in the spinlock busy-loop - waiting on thread0, which is suspended forever because of the never-ending NMI storm
=> that's NOT OK, because thread1 is not suspended (because the NMI storm happens on the other core) - yet - it cannot make any progress

=====================

Why a simple "disable-all-interrupts-except-NMI" implementation of software atomics (obviously without a spinlock as it is a single core) on single core systems is lock-free:

Because - even though a thread might be in the midst of modifying the value of such a software atomic when interrupted by an avalanche of never-ending NMIs, by your definition, this thread (as well as all others on the single-core system) are "suspended" (by the NMI avalanche). So these not making progress is actually fine and the algorithms still qualifies as lock-free.

Note also that the condition that the NMI handlers should not touch such software atomics is a MUST for this case (2), or else such a software implementation would of course be unsound. It is a must for the "lockful" implementation in case (1) from above as well, but for other reason (the NMI might deadlock).

bjorn3 · 2024-08-31T15:36:29Z

Both examples are correct afaik.

taiki-e added the C-bug Category: This is a bug. label Oct 28, 2023

This was referenced Oct 28, 2023

document when atomic loads are guaranteed read-only #115577

Merged

Set max_atomic_width for riscv32*-esp-espidf to 32 #117307

Merged

saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 28, 2023

rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 30, 2023

apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 30, 2023

bors closed this as completed in f3457db Nov 1, 2023

torkleyy mentioned this issue Dec 12, 2023

crossbeam-utils fails to compile on esp32* targets starting Rust 1.74 crossbeam-rs/crossbeam#1048

Closed

HenningHolmDE mentioned this issue Dec 15, 2023

Compilation Fails jasta/esp32-tokio-demo#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESP-IDF targets' Atomic*64 is not lock-free #117305

ESP-IDF targets' Atomic*64 is not lock-free #117305

taiki-e commented Oct 28, 2023

apiraino commented Oct 30, 2023

ivmarkov commented Aug 31, 2024

bjorn3 commented Aug 31, 2024

ivmarkov commented Aug 31, 2024

ivmarkov commented Aug 31, 2024

bjorn3 commented Aug 31, 2024

ivmarkov commented Aug 31, 2024 •

edited

Loading

bjorn3 commented Aug 31, 2024

ESP-IDF targets' Atomic*64 is not lock-free #117305

ESP-IDF targets' Atomic*64 is not lock-free #117305

Comments

taiki-e commented Oct 28, 2023

apiraino commented Oct 30, 2023

ivmarkov commented Aug 31, 2024

bjorn3 commented Aug 31, 2024

ivmarkov commented Aug 31, 2024

ivmarkov commented Aug 31, 2024

bjorn3 commented Aug 31, 2024

ivmarkov commented Aug 31, 2024 • edited Loading

bjorn3 commented Aug 31, 2024

ivmarkov commented Aug 31, 2024 •

edited

Loading