-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definitions of "representation", "validity" etc. #50
Comments
This bogged down the discussion at #16 |
I'll add this to the introduction i'm working on 🥇 |
@RalfJung's blog post on the subject... https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html |
I think #16 got bogged down because there was so little to talk about in terms of representation so folks went on chatting about validity instead.^^ Part of the process here is even figuring out what some of these terms mean. IMO https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/active_discussion/representation.md separates representation quite clearly from any discussion of "invariants". |
One thing I'm not sure about is that because of niche optimization, representation can depend on validity, which is a bit odd because it depends on the state of memory. Should we have something like "bitstring validity" which doesn't depend on memory state? Then repr and "bitstring validity" would be mutually dependent, so repr wouldn't depend on memory. |
I guess you can think of it that way, but a compiler can't know the full set of addresses that are allocated or not allocated anyway, so it's not like type layout could actually depend on memory. (I guess hypothetically a JIT compiler could know that some address range is not available for Rust data and use that for layout optimizations, but would that be actually something we want to rule out?) |
Layout computation just has to do stuff that works with validity for any memory. I do not see a problem. |
@rkruppe: This is an issue about making the spec human-readable, not really about the technical impact. |
Oh, and where do discussions of provenance end up? Are they part of validity, or part of safety? |
Provenance are part of "messy stuff that's barely understood, do we really want to talk about it". ;) More seriously though, I'd say that's the "aliasing model" discussion, with Stacked Borrows and so on. It's part of "specifying the behavior of Rust programs", but I'd keep it separate from validity. |
I'd be quite happy if we end up without provenance. An example which is UB in C due to provenance: fn foo() {
let x = false;
let y = true;
if (&x as usize + 1) == (&y as usize) {
let z: &[bool; 2] = unsafe { &*(&x as *const bool as *const [bool;2]); }
bar(z);
}
} Should we consider |
@asajeffrey if that isn't UB, that would be crazy to me. |
Can we please not discuss provenance in this issue? Open a new one or better a topic on Zulip if you really want to go there at this time already, but this issue seems like the wrong place. |
So... is it worth introducing a concept of "bitstring validity" (name to be bikeshedded), which does not depend on the state of memory? The idea being that representation and bitstring validity are mutually dependent, and that validity depends on both of them? Examples:
Pro: bitstring validity is simpler than validity (especially if that ends up bringing in shadow state like lifetimes or provenance) and makes it clear what representation can depend on. Con: yet another term to define. |
@asajeffrey I have now several times defined "bitstring validity" in terms of validity and you have not yet commented on that. :) So I am wondering if, from your perspective, there is anything wrong with just saying "bitstring validity means validity for some (existentially quantified) memory"? Slightly more formally:
|
@RalfJung the problem with that defn is that it means that representation still depends on all of validity. Imagine the poor reader trying to workout the bitstring representation of a value, and discovering that they have to read about lifetimes, stacked borrows, etc. I agree that it should be the case that bitstring-validity is validity in some memory, but the question is should that be a defn or a derived property? |
I think it would be silly to define a property when it can be exactly derived. That will just lead to inconsistencies. Also, since bitstring validity is literally only relevant for enum layout optimizations, most users should not ever get in contact with that notion. When you are writing unsafe code, if you are wondering about bistring validity, you are doing something wrong. |
@RalfJung "silly" unless you take into account the poor human reader. Bitstring validity is needed in more cases than just enum layout, e.g. if I'm writing a (de)serializer like https://github.com/frankmcsherry/abomonation, I might want to know what the valid bitstrings are. |
As discussed on https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting-2018-12-13, we could do with getting some definitions in place, to avoid us getting bogged down again. AFAIK the terms that are uncontroversial are "size", "alignment", "validity" and "safety". We are discussing whether we also need "bitstring validity", and whether "representation" means "size"+"alignment" or if it also includes "bitstring validity". |
So this came on the meeting today. There was interest in doing a blog post about the work in this WG, to advertise it a bit more, try to involve more people, etc. I think that without an informal definition of what we mean by "representation, validity, and safety invariants of a type", new people will have trouble getting up to speed, separating the different discussions or figuring out what they are about, etc. So it might be worth it to at least write some informal explanations of what these terms mean to help people get started and follow along. We should obviously be clear that these definitions aren't necessarily "precise". |
We can spell out what this derived property looks like. That doesn't make it a definition though.
Why would you? Whenever you construct values, they must be valid, not just valid for some other memory. |
As long as there's some way for the reader to understand that a valid |
Oh, and note to self, if we define " |
It is not a defined but a derived property, but it satisfies the structural principles. |
Concerning "representation", why don't we just replace that term by "layout"? That's how it is called in the Rust compiler. Layout consists of: Size, alignment, ABI details (for passing things by-value) and field offsets. |
Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.
Fine, though we've used the word "representation" a lot, replacing it by "layout" would be quite a bit of work. |
It seems weird to call it layout - layout sounds to me like a property of subobjects in structures, not of distinct objects, and only in memory on computers, not in the value system of Rust - distinguishing between value representation and object representation makes sense, while value layout doesn't make sense at all. |
@ubsan You'd reserve the word "layout" just for objects in memory, e.g. on the heap or pointed to on the stack? So we'd reserve the right for the compiler to do what it likes with locals that aren't addressed using |
@asajeffrey under as-if, the compiler can do whatever. I just think layout is not a good name for "representation". |
Of course it is a property of both whole objects and their subobjects -- as is the representation we discussed here: We talked e.g. about how Nothing in our discussion was exclusively about distinct objects. When we talk about single-field structs having the same representation/layout as their only field, of course this is still true when you put them into e.g. an array and want to use the AFAIK not-yet-decided rule that arrays are represented just like homologous tuples. The discussions we have had in the representations area were all about size, alignment, field offset and ABI concerns. We didn't talk about an abstract set of values vs a concrete set of bits and how they map to each other (which is what "value vs object representation" means to me, but maybe I am misinterpreting?). |
I was basically just confused when I wanted to write the definition and found no way to distinguish "layout" from what we discussed as "representation". |
I mean the fact that bitstring validity of a struct is equivalent to bitstring validity of every field. That's a property inductive definitions have by construction, but it can still also hold for derived definitions such as "valid for some memory". But, anyway, I propose we continue this discussion as part of the validity invariant for references. That's the only type where the distinction between validity and bitstring validity matters, right? |
Yes, the problem is that it doesn't hold for "valid for some memory", e.g. any non-zero word-aligned address
Fine by me, especially if we can confine all discussions of memory and related shadow state to the defn of validity of references. |
What makes you say that? I think it is. In fact it could even be safe. We don't have TBAA. |
For
then any non-zero address |
@rkruppe You are assuming that validity for |
Yeah I realized as soon as I sent the comment and tried to sneak it in as an edit but I was too slow 😅 |
A memory that makes
A memory that makes
I don't think there's any memory that makes both of them valid (since refs have to be non-zero and aligned, so |
Ah, this may be because your defn of validity doesn't require p to point to valid data. |
So, we have some definitions in the glossary now. @asajeffrey are there more terms that you think should be added? If not, can we close this? |
Sure, fine by me to close this. |
We should add basic definitions of terms like "representation", "validity" and friends.
The text was updated successfully, but these errors were encountered: