-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representation of bool, integers and floating points #9
Comments
Don't forget Another topic: Does FFI code need to use types like |
Edited, thanks!
Good question! I presume it does not, but I'd be curious if there is another side to the discussion. *Related note: clearly |
Much of the rust ecosystem (e.g. webrender and therefore firefox) assumes uint64_t and u64 are the same ABI-wise, and I'm unaware of any reason to prevent that assumption.
I believe pointer size is the correct definition but I haven't read the link, so grain of salt
There's no good reason, it's just because llvm has a quirky definition of in-bounds pointer calculations
Signaling NaNs only merit discussion insofar as the IEEE spec defines some random operations to act differently on them. (e.g. Signaling itself is, I think, largely a failed experiment and worth ignoring (soft cc on @stephentyrone in case i'm misremembering) |
Wrt signaling NaNs, it's more of a question for the (now deferred, cc #8) discussion of valid values. There's a persistent rumor (including, at times, among LLVM contributors) that handling an sNaN or doing certain operations on it will cause a trap or is undefined behavior in LLVM. This is not the case, but I've encountered enough people thinking it's true that I think it would be best explicitly state that signaling NaN are perfectly fine, and thus that all bit patterns are valid floats. |
+1 From the point-of-view of just layout, SNaNs are not really that interesting and the easiest thing is to just allow them. AFAIK |
The link is to a comment from @gnzlbg and states:
|
Do you think we should write down that this is something that is presently true but which may be changed in the future (so unsafe code should not rely on it being true)? It seems like it might affect quite a bit how one writes code. |
(In particular, it seems to imply that it is safe to use |
I don't think we can ever change it since it's baked into ptr::offset. If we did it would be in a way where negative offsets were a valid very-large-positive offset, so isize would still "work" but be weird. |
also fwiw I think gcc also gets sad with huge offsets |
I think that the bare minimal guarantee here is that the Rust extern "C" function declarations need to use types that match in size and alignment with the types of the C function declaration. That is, if C uses That would be the bare minimum, and I think that would already be ok since we are just passing bags of bytes here and it is all unsafe anyways. @mw might know whether this can result in any issues due to, e.g., cross-language inlining. If we wanted to extend this minimum, we could map the C types to the Rust types, e.g. saying that if a C's function declaration uses I think if we can get by with only the size and alignment requirements, we should. If someone then uses a |
size and alignment aren't sufficient for ABI. The entire reason we have Similarly the CC for passing size and alignment are only sufficient if you're passing by-reference (and copying the value out manually in the callee). I believe you need to know:
For all of these u64 and uint64_t match perfectly |
I vaguely recall intending to tell the reference folks that they should explicitly distinguish layout (size+align+field offsets) and abi (layout + primitive-ness + homogeneousness). Compatible layouts are sufficient to make type punning tricks work with transmute/pointers, but compatible ABIs are necessary for correctly passing by-value across the FFI boundary. |
One thing we might want to think about is whether the Rust semantics of base types needs any shadow state, e.g. provenance information. IIUC C does :/ since pointers in C have provenance, which is expected to be maintained by casts to/from usize. |
Could you say more on why we would need that? segmented architectures?
(nb I believe miri maintains provenance to avoid smuggling illegal pointer
ops at compile time)
…On Wed, Sep 5, 2018 at 2:29 PM Alan Jeffrey ***@***.***> wrote:
One thing we might want to think about is whether the Rust semantics of
base types needs any shadow state, e.g. provenance information. IIUC C does
:/ since pointers in C have provenance, which is expected to be maintained
by casts to/from usize.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABFY4DHiY3ZEWrphlDCOd-DEAm5-BJJxks5uYBf2gaJpZM4WT3Y3>
.
|
@gankro C interop mainly. IIUC in C, casting a |
@asajeffrey I think @RalfJung post (https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html) might say that yes, we need to track provenance when casting to integers and back, and that just because two pointers have the same numeric value when interpreted as an usize does not mean that they are interchangeable. Whether this implies that two usizes that have the same numeric value are interchangeable when casting them to a pointer... I don't know. I would expect that for these usizes, where they come from is important as well. |
I don't see what this has to do with the memory layout of primitives. Whatever model we choose has to allow implementing Rust with pointers being mere addresses, just as C can be implemented that way. Further state might be needed to determine whether an execution is UB or not, but that's
|
@rkruppe Fair enough, if we're tabling what the semantics of primitives is for the moment, as long as people are aware that there might be more to the semantics of primitives than just their memory layout. |
The obvious question when talking about interacting with native C ABIs is what about platforms where
My reading of the C standard does not agree that this is the correct ABI. I went into it in detail on Zulip, but I believe that it is possible for a (Note that this discussion also relates to rust-lang/rfcs#992.) |
Yes, I believe that we don't care about:
and almost certainly don't care about:
I expect we don't care about a platform with weird bools, but I didn't follow that RFC so idk |
Agreed.
I don't think there's any fundamental reason not to support architectures that, for instance, distinguish between code and data memory.
We do need to support platforms that have real memory at 0, though writing to that memory might require some care. But yeah, we don't need to support platforms where
Agreed.
We shouldn't make any design decisions that would absolutely rule them out in the (distant) future, though. |
Many comments in a recent article in hack a day where complaining about how Rust is not a language that they can consider for their applications because it can't target X. If we make it impossible to support these, we are making room for languages lower-level than Rust, but higher-level than assembly (e.g. C and C++ which support most of these). I'm not saying we have to support all of these, but I'd be more comfortable knowing exactly which hardware Rust will never be able to target because of these decisions. |
Note that C++20 will likely specify it two's complement as the representation of signed integers and rule out other representations like sign-magnitude or one's complement (http://wg21.link/p0907). Apparently the C standard committee is inclined to do the same (https://twitter.com/jfbastien/status/989242576598327296). A more general point regarding extremely niche implementation choices such as non-octet-bytes or NULL-at-nonzero-address: people are going to write code that relies on assumptions that are true on every platform they have ever heard of, and for good reason, as it simplifies their code at effectively no loss of portability. We can't prevent that, nor should we IMO, at most we could tell these people they are relying on implementation-defined behavior, which just makes it a de facto standard rather than a de jure one. The only benefit for those who port Rust to such oddball architectures is the reassurance that their port is technically conforming to "the Rust(tm) language" rather than technically being an extremely close dialect of it, but it won't change the fact that they can't run a ton of real Rust code without auditing it and removing these hard-coded assumptions. So I do not worry very much about accomodating architectural choices that deviate from the overwhelming consensus of today's platforms. This of course assuming there is such an overwhelming consensus, thus I agree with the need for a survey that @gnzlbg raised. |
@gankro For segmented architectures, WASM may end up with a memory architecture that distinguishes between shared- and non-shared memory. Many systems already do this for processes, WASM may end up doing this for threads too. Not sure how this will play with APIs like Rust mutexes. |
@gankro mentioned "segmented architectures". There are many 16-bit Intel CPUs like the 8086 that need segmented memory, people like to hack on, and LLVM can target (x86 in 16-bit mode). Whether a Rust dialect for targeting the 8086 might be easy to create and closely resemble Rust, or not end up looking like Rust at all, will depend on which choices we make here.
I think it might also be worth it to survey how hard would it be to support some of the things @gankro mentioned implementation wise and from the language complexity perspective, and compare that to the hardware that they would enable targeting. For most of them I'd guess its probably: "very hard to implement", "significantly complicates the language", "allow us to target almost no new hardware". But for some of them like "segmented architectures" it might be "not that hard to support", "does not significantly complicate the language", and "enables a lot of hardware". In particular, the decisions here don't have to be black and white (have feature => support hardware vs no feature => no hardware support). It might be interesting to consider an extra constraint where we don't have the feature in Rust, but this is done in such a way that creating a Rust dialect (e.g. via a nightly feature) that still resembles Rust, and can target more esoteric hardware, remains possible. |
I read rust-lang/rust#46176 and I understand that it was decided to not reject use of |
In particular, my understanding is that it was decided to let people assume |
I wrote this big thing detailing what I believe to be true about layouts and ABIs in rust: https://gankro.github.io/blah/rust-layouts-and-abis/ |
Thanks. That matches what I would expect. One nit: "Here is a table of the ABIs of the core primitives in Rust, which C/C++ types they are guaranteed to be ABI compatible with," I'm not sure if you're saying that you already think that that statement is true (somewhere official documentation guarantees that equivalence). The problem that this issue is attempting to address is that there's isn't such a guarantee in any official documentation yet. |
We're relying on these bridgings being accurate in Firefox, as is every other project using bindgen/cbindgen. And these projects have worked closely with the Rust team to make sure we're not running afoul of anything. I agree these claims should however be formally documented in e.g. The Reference or something. |
should this discussion deal with all scalar types (aka should we include characters in this discussion?) |
@avadacatavra People have argued that they shouldn't be ABI compatible, since |
In https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins, it would be useful to define the ABI correspondence for function parameters |
@briansmith I believe that is implicit in pointer ABI matching and array layout matching. I'm not aware of any system under-which the ABI of a pointer depends on the pointee's type, and array types in function parameters are just sugar for pointers. |
Summarizing the discussion about The representation of
etc. We should document these, but they don't change This definition would also limit the problematic platforms to those that either do not have a native pointer size (can't think of any) or those that have multiple native pointer sizes (near and far pointers in segmented architectures). I'd say it's ok to worry about them when someone tries to add support for them (for segmented archs one could pick one of the pointer types as "native" and add newer types for the rest). |
The merged PR specifies
and with more rationale here:
We have to decide whether we want [0] The MSVC2012 docs mention that MSVC <= 4.2 |
FWIW, this is what @gankro 's document documents (https://gankro.github.io/blah/rust-layouts-and-abis/):
|
Just to add more context: MSVC 5.0 fixed the sizeof bool in 1997, meaning this was fixed before windows 98 was released. Considering windows xp is a tier 3 (cross-compile only) platform because it's so janky and officially unsupported by the vendor, I find it dubious that we have interest in supporting compatibility with pre-windows-98 systems. |
Great presentation by JF Bastien on efforts to standardize int/bool repr in C++20: talk: https://www.youtube.com/watch?v=JhUxIVf1qok |
I've went through the reference and all the comments:
If you feel I missed anything, it would be better to open an specific issue to discuss that. |
This issue is to discuss the memory layout for integral and floating point types:
bool
u8
..u128
,i8
..i128
usize
,isize
f32
,f64
For the most part, these are relatively uncontroversial. However, there are some interesting things worth discussing:
#[repr(C)]
vs#[repr(Rust)]
variants here. The size is always fixed and well-defined across FFI boundaries. The types map to their corresponding integral types in the surrounding C ABI.usize
intended to be defined on various platforms?isize
- Can we say a bit more about why? (e.g., ensuring that "pointer diff" is representable
The text was updated successfully, but these errors were encountered: