Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtomicBool layout changes to become amo biased rather than bool biased #249

Closed
SUPERCILEX opened this issue Jul 27, 2023 · 17 comments
Closed
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@SUPERCILEX
Copy link

Proposal

Problem statement

See Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/AtomicBool.20useless.20on.20riscv.

Gist: AtomicBool uses a u8 under the hood, but not all hardware supports 8/16 bit atomics even if it support 32 bit atomics. Change the memory layout of bool to match the hardware's amo type size.

Solution sketch

+ pub type AtomicBoolType = cfg(smallest_type_supported_by_hardware)

- fn from_ptr(ptr: *mut bool) -> &AtomicBool
+ fn from_ptr(ptr: *mut AtomicBoolType) -> &AtomicBool

- fn from_mut(v: &mut bool) -> &mut AtomicBool
+ fn from_mut(v: &mut AtomicBoolType) -> &mut AtomicBool

- fn get_mut_slice(this: &mut [AtomicBool]) -> &mut [bool]
+ fn get_mut_slice(this: &mut [AtomicBool]) -> &mut [AtomicBoolType]

- fn from_mut_slice(v: &mut [bool]) -> &mut [AtomicBool]
+ fn from_mut_slice(v: &mut [AtomicBoolType]) -> &mut [AtomicBool]

Downsides

People that are relying to AtomicBool being layout compatible with bool are going to be sad.

Upside

We can be maximally efficient on all hardware while still letting people do weird transmutes since they have AtomicBoolType.

@SUPERCILEX SUPERCILEX added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jul 27, 2023
@SUPERCILEX
Copy link
Author

I need to think about whether or not AtomicBoolType should always be a u* or if it can be a bool on the platforms that support u8s.

@BurntSushi
Copy link
Member

I don't think we can/should do this. The docs on AtomicBool are crystal clear:

This type has the same in-memory representation as a bool.

Changing that seems like a breaking change of the worst kind. I realize the actual impact of this will be limited to some rather niche platforms, but we've promised it will have the same layout as bool in very clear terms.

How common of a need is it to be maximally efficient here on platforms where our current AtomicBool is inappropriate? If it's rare, inline assembly seems like an appropriate solution to me. But I'm not 100% certain.

@workingjubilee
Copy link
Member

workingjubilee commented Jul 27, 2023

That change was a regression from using a usize, the costs of which were not discussed, and previously it was also a breaking change (de facto if not de jure).

@BurntSushi
Copy link
Member

@workingjubilee Is that a rebuttal? An FYI? Can you elaborate please?

@workingjubilee
Copy link
Member

Mostly just a remark, I guess?

The flipside of this is that RISCV itself has a de facto (if not de jure) encoding for byte-level atomic instructions, and it's not clear what the performance characteristics of most RISCV architectures with high core counts and full implementation of the atomic instruction set will be, because... most RISCV architectures aren't printed yet.

@thomcc
Copy link
Member

thomcc commented Jul 28, 2023

Yeah, I think I'm inclined to agree with @BurntSushi -- I don't think we should make a habit of considering concrete promises we make about type layout to be things we can go back on.

@the8472
Copy link
Member

the8472 commented Jul 28, 2023

It's not even obvious if that's a win. Making the type larger can blow up the size of structs containing them. It's a size vs. instruction-overhead tradeoff.

@SUPERCILEX
Copy link
Author

It's a size vs. instruction-overhead tradeoff.

What? It's the difference between looping load-reserve/store-conditional instructions and an amo instruction. I don't have an arm or riscv machine to benchmark things, but I would be very surprised if those had comparable performance under contention.

@the8472
Copy link
Member

the8472 commented Jul 28, 2023

That's the overhead part. Smaller structs, more expensive instructions.

@SUPERCILEX
Copy link
Author

The comparison you're making is bizarre. In what scenario do people have millions of atomic booleans laying around?

@programmerjake
Copy link
Member

how about a lock-free hash table of some sort?

@programmerjake
Copy link
Member

or Java-style mutex-per-object where you want them to be 1 byte to save space

@SUPERCILEX
Copy link
Author

Both of those would be questionable implementations without some way to wait (consider the case where the OS context switches in the middle of the critical section) which on linux requires a 32-bit atomic.

But if people are against changing the layout of AtomicBool, then I can switch this proposal to AtomicFlag which would look just like the AtomicBool I proposed.

@thomcc
Copy link
Member

thomcc commented Jul 29, 2023

Both of those would be questionable implementations without some way to wait (consider the case where the OS context switches in the middle of the critical section) which on linux requires a 32-bit atomic.

Are you suggesting changing AtomicBool to be 32 bits everywhere then?

But if people are against changing the layout of AtomicBool, then I can switch this proposal to AtomicFlag which would look just like the AtomicBool I proposed.

I think this would be more reasonable, although I'm not sure it's worth having, given the additional complexity it adds.

@SUPERCILEX
Copy link
Author

Are you suggesting changing AtomicBool to be 32 bits everywhere then?

That doesn't sound too crazy to me, but no. I think there are still many use cases where you just want to know if something is active. Then again, AtomicFlag would be simpler if it was always a u32, but it's easy to imagine regretting that later when some architecture decides it only supports 42 bit atomics or something.

I think this would be more reasonable, although I'm not sure it's worth having, given the additional complexity it adds.

Yeah, I think that's probably what this issue should be about. I personally don't think having a memory layout compatible type and an amo compatible type is too much added complexity.

@thomcc
Copy link
Member

thomcc commented Jul 29, 2023

The fact that we also need to create another type that only allows 0 and 1 for this is a more significant issue in terms of complexity, I think. It will need its own API and conversions, for example.

@SUPERCILEX
Copy link
Author

Ok, the discussion on Zulip concluded that rust-lang/rust#114034 and llvm/llvm-project#64090 were satisfactory solutions.

@dtolnay dtolnay closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

7 participants