Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definitions of "representation", "validity" etc. #50

Closed
asajeffrey opened this issue Nov 29, 2018 · 42 comments
Closed

Definitions of "representation", "validity" etc. #50

asajeffrey opened this issue Nov 29, 2018 · 42 comments

Comments

@asajeffrey
Copy link

We should add basic definitions of terms like "representation", "validity" and friends.

@asajeffrey
Copy link
Author

This bogged down the discussion at #16

@avadacatavra
Copy link
Contributor

I'll add this to the introduction i'm working on 🥇

@asajeffrey
Copy link
Author

@RalfJung
Copy link
Member

I think #16 got bogged down because there was so little to talk about in terms of representation so folks went on chatting about validity instead.^^

Part of the process here is even figuring out what some of these terms mean. IMO https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/active_discussion/representation.md separates representation quite clearly from any discussion of "invariants".

@asajeffrey
Copy link
Author

One thing I'm not sure about is that because of niche optimization, representation can depend on validity, which is a bit odd because it depends on the state of memory. Should we have something like "bitstring validity" which doesn't depend on memory state? Then repr and "bitstring validity" would be mutually dependent, so repr wouldn't depend on memory.

@hanna-kruppe
Copy link

I guess you can think of it that way, but a compiler can't know the full set of addresses that are allocated or not allocated anyway, so it's not like type layout could actually depend on memory.

(I guess hypothetically a JIT compiler could know that some address range is not available for Rust data and use that for layout optimizations, but would that be actually something we want to rule out?)

@RalfJung
Copy link
Member

Layout computation just has to do stuff that works with validity for any memory. I do not see a problem.

@asajeffrey
Copy link
Author

@rkruppe: This is an issue about making the spec human-readable, not really about the technical impact.

@asajeffrey
Copy link
Author

Oh, and where do discussions of provenance end up? Are they part of validity, or part of safety?

@RalfJung
Copy link
Member

RalfJung commented Nov 29, 2018

Provenance are part of "messy stuff that's barely understood, do we really want to talk about it". ;)

More seriously though, I'd say that's the "aliasing model" discussion, with Stacked Borrows and so on. It's part of "specifying the behavior of Rust programs", but I'd keep it separate from validity.

@asajeffrey
Copy link
Author

I'd be quite happy if we end up without provenance. An example which is UB in C due to provenance:

  fn foo() {
    let x = false;
    let y = true;
    if (&x as usize + 1) == (&y as usize) {
      let z: &[bool; 2] = unsafe { &*(&x as *const bool as *const [bool;2]); }
      bar(z);
    }
  }

Should we consider foo() to be UB? What if bar is a C function?

@strega-nil
Copy link

@asajeffrey if that isn't UB, that would be crazy to me.

@RalfJung
Copy link
Member

Can we please not discuss provenance in this issue? Open a new one or better a topic on Zulip if you really want to go there at this time already, but this issue seems like the wrong place.

@asajeffrey
Copy link
Author

@RalfJung good point, we're off-topic. I filed #52.

@asajeffrey
Copy link
Author

So... is it worth introducing a concept of "bitstring validity" (name to be bikeshedded), which does not depend on the state of memory? The idea being that representation and bitstring validity are mutually dependent, and that validity depends on both of them?

Examples:

  • a bitstring valid bool is 0x0 or 0x1,
  • a bitstring valid &T is non-zero and a multiple of Ts alignment,
  • etc.

Pro: bitstring validity is simpler than validity (especially if that ends up bringing in shadow state like lifetimes or provenance) and makes it clear what representation can depend on.

Con: yet another term to define.

@RalfJung
Copy link
Member

RalfJung commented Nov 30, 2018

@asajeffrey I have now several times defined "bitstring validity" in terms of validity and you have not yet commented on that. :)

So I am wondering if, from your perspective, there is anything wrong with just saying "bitstring validity means validity for some (existentially quantified) memory"?

Slightly more formally:

bitsring_valid(type, value) := exists memory, valid(type, value, memory)

@asajeffrey
Copy link
Author

@RalfJung the problem with that defn is that it means that representation still depends on all of validity. Imagine the poor reader trying to workout the bitstring representation of a value, and discovering that they have to read about lifetimes, stacked borrows, etc.

I agree that it should be the case that bitstring-validity is validity in some memory, but the question is should that be a defn or a derived property?

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2018

I think it would be silly to define a property when it can be exactly derived. That will just lead to inconsistencies.

Also, since bitstring validity is literally only relevant for enum layout optimizations, most users should not ever get in contact with that notion. When you are writing unsafe code, if you are wondering about bistring validity, you are doing something wrong.

@asajeffrey
Copy link
Author

@RalfJung "silly" unless you take into account the poor human reader.

Bitstring validity is needed in more cases than just enum layout, e.g. if I'm writing a (de)serializer like https://github.com/frankmcsherry/abomonation, I might want to know what the valid bitstrings are.

@asajeffrey
Copy link
Author

As discussed on https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting-2018-12-13, we could do with getting some definitions in place, to avoid us getting bogged down again.

AFAIK the terms that are uncontroversial are "size", "alignment", "validity" and "safety". We are discussing whether we also need "bitstring validity", and whether "representation" means "size"+"alignment" or if it also includes "bitstring validity".

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 13, 2018

So this came on the meeting today. There was interest in doing a blog post about the work in this WG, to advertise it a bit more, try to involve more people, etc.

I think that without an informal definition of what we mean by "representation, validity, and safety invariants of a type", new people will have trouble getting up to speed, separating the different discussions or figuring out what they are about, etc. So it might be worth it to at least write some informal explanations of what these terms mean to help people get started and follow along. We should obviously be clear that these definitions aren't necessarily "precise".

@RalfJung
Copy link
Member

"silly" unless you take into account the poor human reader.

We can spell out what this derived property looks like. That doesn't make it a definition though.

if I'm writing a (de)serializer like frankmcsherry/abomonation, I might want to know what the valid bitstrings are.

Why would you? Whenever you construct values, they must be valid, not just valid for some other memory.

@asajeffrey
Copy link
Author

As long as there's some way for the reader to understand that a valid bool is either true or false, without having to read anything about the semantics of memory, then I don't really mind. I'm just worried that a reader is going to get lost in the definitions if we don't have a clear way of saying "you only need to read these defns if you're worried about refererence types".

@asajeffrey
Copy link
Author

Oh, and note to self, if we define "V is bitstring valid for type T" as "there exists a memory M s.t. V is valid for type T in memory M" then this isn't a property we can define by induction on the structure of T. For example with this defn, any non-zero word-aligned address p is valid for type &&usize and for type &bool but (p,p) is not valid for type (&&usize, &bool).

@RalfJung
Copy link
Member

this isn't a property we can define by induction on the structure of T

It is not a defined but a derived property, but it satisfies the structural principles.

@RalfJung
Copy link
Member

Concerning "representation", why don't we just replace that term by "layout"? That's how it is called in the Rust compiler. Layout consists of: Size, alignment, ABI details (for passing things by-value) and field offsets.

@asajeffrey
Copy link
Author

this isn't a property we can define by induction on the structure of T

It is not a defined but a derived property, but it satisfies the structural principles.

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

Concerning "representation", why don't we just replace that term by "layout"? That's how it is called in the Rust compiler. Layout consists of: Size, alignment, ABI details (for passing things by-value) and field offsets.

Fine, though we've used the word "representation" a lot, replacing it by "layout" would be quite a bit of work.

@strega-nil
Copy link

It seems weird to call it layout - layout sounds to me like a property of subobjects in structures, not of distinct objects, and only in memory on computers, not in the value system of Rust - distinguishing between value representation and object representation makes sense, while value layout doesn't make sense at all.

@asajeffrey
Copy link
Author

@ubsan You'd reserve the word "layout" just for objects in memory, e.g. on the heap or pointed to on the stack? So we'd reserve the right for the compiler to do what it likes with locals that aren't addressed using &x?

@strega-nil
Copy link

@asajeffrey under as-if, the compiler can do whatever. I just think layout is not a good name for "representation".

@RalfJung
Copy link
Member

RalfJung commented Dec 15, 2018

layout sounds to me like a property of subobjects in structures, not of distinct objects,

Of course it is a property of both whole objects and their subobjects -- as is the representation we discussed here: We talked e.g. about how repr(C) unions have offset 0 for their fields, and then of course the representation of these fields apply.

Nothing in our discussion was exclusively about distinct objects. When we talk about single-field structs having the same representation/layout as their only field, of course this is still true when you put them into e.g. an array and want to use the AFAIK not-yet-decided rule that arrays are represented just like homologous tuples.

The discussions we have had in the representations area were all about size, alignment, field offset and ABI concerns. We didn't talk about an abstract set of values vs a concrete set of bits and how they map to each other (which is what "value vs object representation" means to me, but maybe I am misinterpreting?).

@RalfJung
Copy link
Member

RalfJung commented Dec 15, 2018

Fine, though we've used the word "representation" a lot, replacing it by "layout" would be quite a bit of work.

I was basically just confused when I wanted to write the definition and found no way to distinguish "layout" from what we discussed as "representation".

@RalfJung
Copy link
Member

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

I mean the fact that bitstring validity of a struct is equivalent to bitstring validity of every field. That's a property inductive definitions have by construction, but it can still also hold for derived definitions such as "valid for some memory".

But, anyway, I propose we continue this discussion as part of the validity invariant for references. That's the only type where the distinction between validity and bitstring validity matters, right?

@asajeffrey
Copy link
Author

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

I mean the fact that bitstring validity of a struct is equivalent to bitstring validity of every field. That's a property inductive definitions have by construction, but it can still also hold for derived definitions such as "valid for some memory".

Yes, the problem is that it doesn't hold for "valid for some memory", e.g. any non-zero word-aligned address p is "valid for some memory" at type &&usize and at type &bool but (p,p) is not "valid for some memory" at type (&&usize, &bool).

But, anyway, I propose we continue this discussion as part of the validity invariant for references. That's the only type where the distinction between validity and bitstring validity matters, right?

Fine by me, especially if we can confine all discussions of memory and related shadow state to the defn of validity of references.

@RalfJung
Copy link
Member

(p,p) is not "valid for some memory" at type (&&usize, &bool).

What makes you say that? I think it is. In fact it could even be safe. We don't have TBAA.

@hanna-kruppe
Copy link

hanna-kruppe commented Dec 18, 2018

For (&bool, &&usize) things should be OK, but for example given

#[repr(u8)] enum Foo { A = 1, B = 2 }
#[repr(u8)] enum Bar { C = 3, D = 4 }

then any non-zero address p is valid for &Foo and &Bar individually in some memory, but (p, p): (&Foo, &Bar) is not valid in any memory edit: in case valid references have to point to valid memory (which AIUI is still a possibility at this point).

@RalfJung
Copy link
Member

RalfJung commented Dec 18, 2018

@rkruppe You are assuming that validity for Foo requires the pointer to point to valid data. Some people have been suggesting this, but I think that's actually a bad idea (and I'll spell out my reasoning in the issue for reference validity once it is opened).

@hanna-kruppe
Copy link

Yeah I realized as soon as I sent the comment and tried to sneak it in as an edit but I was too slow 😅

@asajeffrey
Copy link
Author

(p,p) is not "valid for some memory" at type (&&usize, &bool).
What makes you say that? I think it is. In fact it could even be safe. We don't have TBAA.

A memory that makes p valid at type &&usize is:

  p -> q
  q -> 0x1234

A memory that makes p valid at type &bool is:

  p -> 0x1

I don't think there's any memory that makes both of them valid (since refs have to be non-zero and aligned, so bool and &usize have no overlap) so there's no memory which makes (p,p) valid at type (&&usize, &bool).

@asajeffrey
Copy link
Author

Ah, this may be because your defn of validity doesn't require p to point to valid data.

@RalfJung
Copy link
Member

So, we have some definitions in the glossary now. @asajeffrey are there more terms that you think should be added? If not, can we close this?

@asajeffrey
Copy link
Author

Sure, fine by me to close this.

@RalfJung RalfJung closed this as completed Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants