Definitions of "representation", "validity" etc. #50

asajeffrey · 2018-11-29T16:36:30Z

We should add basic definitions of terms like "representation", "validity" and friends.

asajeffrey · 2018-11-29T16:37:48Z

This bogged down the discussion at #16

avadacatavra · 2018-11-29T16:38:43Z

I'll add this to the introduction i'm working on 🥇

asajeffrey · 2018-11-29T16:40:17Z

@RalfJung's blog post on the subject... https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html

RalfJung · 2018-11-29T16:41:12Z

I think #16 got bogged down because there was so little to talk about in terms of representation so folks went on chatting about validity instead.^^

Part of the process here is even figuring out what some of these terms mean. IMO https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/active_discussion/representation.md separates representation quite clearly from any discussion of "invariants".

asajeffrey · 2018-11-29T16:43:06Z

One thing I'm not sure about is that because of niche optimization, representation can depend on validity, which is a bit odd because it depends on the state of memory. Should we have something like "bitstring validity" which doesn't depend on memory state? Then repr and "bitstring validity" would be mutually dependent, so repr wouldn't depend on memory.

hanna-kruppe · 2018-11-29T16:47:02Z

I guess you can think of it that way, but a compiler can't know the full set of addresses that are allocated or not allocated anyway, so it's not like type layout could actually depend on memory.

(I guess hypothetically a JIT compiler could know that some address range is not available for Rust data and use that for layout optimizations, but would that be actually something we want to rule out?)

RalfJung · 2018-11-29T16:48:24Z

Layout computation just has to do stuff that works with validity for any memory. I do not see a problem.

asajeffrey · 2018-11-29T16:48:41Z

@rkruppe: This is an issue about making the spec human-readable, not really about the technical impact.

asajeffrey · 2018-11-29T16:52:51Z

Oh, and where do discussions of provenance end up? Are they part of validity, or part of safety?

RalfJung · 2018-11-29T16:56:22Z

Provenance are part of "messy stuff that's barely understood, do we really want to talk about it". ;)

More seriously though, I'd say that's the "aliasing model" discussion, with Stacked Borrows and so on. It's part of "specifying the behavior of Rust programs", but I'd keep it separate from validity.

asajeffrey · 2018-11-29T17:17:18Z

I'd be quite happy if we end up without provenance. An example which is UB in C due to provenance:

  fn foo() {
    let x = false;
    let y = true;
    if (&x as usize + 1) == (&y as usize) {
      let z: &[bool; 2] = unsafe { &*(&x as *const bool as *const [bool;2]); }
      bar(z);
    }
  }

Should we consider foo() to be UB? What if bar is a C function?

strega-nil · 2018-11-29T17:19:47Z

@asajeffrey if that isn't UB, that would be crazy to me.

RalfJung · 2018-11-29T17:23:08Z

Can we please not discuss provenance in this issue? Open a new one or better a topic on Zulip if you really want to go there at this time already, but this issue seems like the wrong place.

asajeffrey · 2018-11-29T17:28:47Z

@RalfJung good point, we're off-topic. I filed #52.

asajeffrey · 2018-11-30T15:08:16Z

So... is it worth introducing a concept of "bitstring validity" (name to be bikeshedded), which does not depend on the state of memory? The idea being that representation and bitstring validity are mutually dependent, and that validity depends on both of them?

Examples:

a bitstring valid bool is 0x0 or 0x1,
a bitstring valid &T is non-zero and a multiple of Ts alignment,
etc.

Pro: bitstring validity is simpler than validity (especially if that ends up bringing in shadow state like lifetimes or provenance) and makes it clear what representation can depend on.

Con: yet another term to define.

RalfJung · 2018-11-30T15:10:50Z

@asajeffrey I have now several times defined "bitstring validity" in terms of validity and you have not yet commented on that. :)

So I am wondering if, from your perspective, there is anything wrong with just saying "bitstring validity means validity for some (existentially quantified) memory"?

Slightly more formally:

bitsring_valid(type, value) := exists memory, valid(type, value, memory)

asajeffrey · 2018-11-30T15:14:18Z

@RalfJung the problem with that defn is that it means that representation still depends on all of validity. Imagine the poor reader trying to workout the bitstring representation of a value, and discovering that they have to read about lifetimes, stacked borrows, etc.

I agree that it should be the case that bitstring-validity is validity in some memory, but the question is should that be a defn or a derived property?

RalfJung · 2018-12-02T12:46:06Z

I think it would be silly to define a property when it can be exactly derived. That will just lead to inconsistencies.

Also, since bitstring validity is literally only relevant for enum layout optimizations, most users should not ever get in contact with that notion. When you are writing unsafe code, if you are wondering about bistring validity, you are doing something wrong.

asajeffrey · 2018-12-13T16:43:45Z

@RalfJung "silly" unless you take into account the poor human reader.

Bitstring validity is needed in more cases than just enum layout, e.g. if I'm writing a (de)serializer like https://github.com/frankmcsherry/abomonation, I might want to know what the valid bitstrings are.

asajeffrey · 2018-12-13T16:47:21Z

As discussed on https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting-2018-12-13, we could do with getting some definitions in place, to avoid us getting bogged down again.

AFAIK the terms that are uncontroversial are "size", "alignment", "validity" and "safety". We are discussing whether we also need "bitstring validity", and whether "representation" means "size"+"alignment" or if it also includes "bitstring validity".

gnzlbg · 2018-12-13T16:50:17Z

So this came on the meeting today. There was interest in doing a blog post about the work in this WG, to advertise it a bit more, try to involve more people, etc.

I think that without an informal definition of what we mean by "representation, validity, and safety invariants of a type", new people will have trouble getting up to speed, separating the different discussions or figuring out what they are about, etc. So it might be worth it to at least write some informal explanations of what these terms mean to help people get started and follow along. We should obviously be clear that these definitions aren't necessarily "precise".

RalfJung · 2018-12-13T20:22:00Z

"silly" unless you take into account the poor human reader.

We can spell out what this derived property looks like. That doesn't make it a definition though.

if I'm writing a (de)serializer like frankmcsherry/abomonation, I might want to know what the valid bitstrings are.

Why would you? Whenever you construct values, they must be valid, not just valid for some other memory.

asajeffrey · 2018-12-13T21:26:21Z

As long as there's some way for the reader to understand that a valid bool is either true or false, without having to read anything about the semantics of memory, then I don't really mind. I'm just worried that a reader is going to get lost in the definitions if we don't have a clear way of saying "you only need to read these defns if you're worried about refererence types".

asajeffrey · 2018-12-13T21:33:37Z

Oh, and note to self, if we define "V is bitstring valid for type T" as "there exists a memory M s.t. V is valid for type T in memory M" then this isn't a property we can define by induction on the structure of T. For example with this defn, any non-zero word-aligned address p is valid for type &&usize and for type &bool but (p,p) is not valid for type (&&usize, &bool).

RalfJung · 2018-12-14T09:24:40Z

this isn't a property we can define by induction on the structure of T

It is not a defined but a derived property, but it satisfies the structural principles.

RalfJung · 2018-12-14T10:55:01Z

Concerning "representation", why don't we just replace that term by "layout"? That's how it is called in the Rust compiler. Layout consists of: Size, alignment, ABI details (for passing things by-value) and field offsets.

asajeffrey · 2018-12-14T17:58:35Z

this isn't a property we can define by induction on the structure of T

It is not a defined but a derived property, but it satisfies the structural principles.

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

Concerning "representation", why don't we just replace that term by "layout"? That's how it is called in the Rust compiler. Layout consists of: Size, alignment, ABI details (for passing things by-value) and field offsets.

Fine, though we've used the word "representation" a lot, replacing it by "layout" would be quite a bit of work.

strega-nil · 2018-12-14T19:51:54Z

It seems weird to call it layout - layout sounds to me like a property of subobjects in structures, not of distinct objects, and only in memory on computers, not in the value system of Rust - distinguishing between value representation and object representation makes sense, while value layout doesn't make sense at all.

asajeffrey · 2018-12-15T00:19:11Z

@ubsan You'd reserve the word "layout" just for objects in memory, e.g. on the heap or pointed to on the stack? So we'd reserve the right for the compiler to do what it likes with locals that aren't addressed using &x?

strega-nil · 2018-12-15T01:06:09Z

@asajeffrey under as-if, the compiler can do whatever. I just think layout is not a good name for "representation".

RalfJung · 2018-12-15T10:14:29Z

layout sounds to me like a property of subobjects in structures, not of distinct objects,

Of course it is a property of both whole objects and their subobjects -- as is the representation we discussed here: We talked e.g. about how repr(C) unions have offset 0 for their fields, and then of course the representation of these fields apply.

Nothing in our discussion was exclusively about distinct objects. When we talk about single-field structs having the same representation/layout as their only field, of course this is still true when you put them into e.g. an array and want to use the AFAIK not-yet-decided rule that arrays are represented just like homologous tuples.

The discussions we have had in the representations area were all about size, alignment, field offset and ABI concerns. We didn't talk about an abstract set of values vs a concrete set of bits and how they map to each other (which is what "value vs object representation" means to me, but maybe I am misinterpreting?).

RalfJung · 2018-12-15T10:16:09Z

Fine, though we've used the word "representation" a lot, replacing it by "layout" would be quite a bit of work.

I was basically just confused when I wanted to write the definition and found no way to distinguish "layout" from what we discussed as "representation".

RalfJung · 2018-12-15T12:12:01Z

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

I mean the fact that bitstring validity of a struct is equivalent to bitstring validity of every field. That's a property inductive definitions have by construction, but it can still also hold for derived definitions such as "valid for some memory".

But, anyway, I propose we continue this discussion as part of the validity invariant for references. That's the only type where the distinction between validity and bitstring validity matters, right?

asajeffrey · 2018-12-17T16:01:56Z

Er, what do you mean by "the structural principles"? There's no inductive definition equivalent to "V is bitstring valid for T", c.f. example above. "V is valid for T in memory M" is inductive in T.

I mean the fact that bitstring validity of a struct is equivalent to bitstring validity of every field. That's a property inductive definitions have by construction, but it can still also hold for derived definitions such as "valid for some memory".

Yes, the problem is that it doesn't hold for "valid for some memory", e.g. any non-zero word-aligned address p is "valid for some memory" at type &&usize and at type &bool but (p,p) is not "valid for some memory" at type (&&usize, &bool).

But, anyway, I propose we continue this discussion as part of the validity invariant for references. That's the only type where the distinction between validity and bitstring validity matters, right?

Fine by me, especially if we can confine all discussions of memory and related shadow state to the defn of validity of references.

RalfJung · 2018-12-18T16:39:52Z

(p,p) is not "valid for some memory" at type (&&usize, &bool).

What makes you say that? I think it is. In fact it could even be safe. We don't have TBAA.

hanna-kruppe · 2018-12-18T16:43:49Z

For (&bool, &&usize) things should be OK, but for example given

#[repr(u8)] enum Foo { A = 1, B = 2 }
#[repr(u8)] enum Bar { C = 3, D = 4 }

then any non-zero address p is valid for &Foo and &Bar individually in some memory, but (p, p): (&Foo, &Bar) is not valid in any memory edit: in case valid references have to point to valid memory (which AIUI is still a possibility at this point).

RalfJung · 2018-12-18T16:47:57Z

@rkruppe You are assuming that validity for Foo requires the pointer to point to valid data. Some people have been suggesting this, but I think that's actually a bad idea (and I'll spell out my reasoning in the issue for reference validity once it is opened).

hanna-kruppe · 2018-12-18T17:07:21Z

Yeah I realized as soon as I sent the comment and tried to sneak it in as an edit but I was too slow 😅

asajeffrey · 2018-12-18T17:09:49Z

(p,p) is not "valid for some memory" at type (&&usize, &bool).
What makes you say that? I think it is. In fact it could even be safe. We don't have TBAA.

A memory that makes p valid at type &&usize is:

  p -> q
  q -> 0x1234

A memory that makes p valid at type &bool is:

  p -> 0x1

I don't think there's any memory that makes both of them valid (since refs have to be non-zero and aligned, so bool and &usize have no overlap) so there's no memory which makes (p,p) valid at type (&&usize, &bool).

asajeffrey · 2018-12-18T17:13:48Z

Ah, this may be because your defn of validity doesn't require p to point to valid data.

RalfJung · 2019-02-28T16:28:41Z

So, we have some definitions in the glossary now. @asajeffrey are there more terms that you think should be added? If not, can we close this?

asajeffrey · 2019-02-28T23:33:37Z

Sure, fine by me to close this.

RalfJung mentioned this issue Jan 10, 2019

Validity of references: Memory-related properties #77

Closed

RalfJung closed this as completed Mar 1, 2019

Definitions of "representation", "validity" etc. #50

Definitions of "representation", "validity" etc. #50

Comments

asajeffrey commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

avadacatavra commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

RalfJung commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

hanna-kruppe commented Nov 29, 2018

RalfJung commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

RalfJung commented Nov 29, 2018 • edited Loading

asajeffrey commented Nov 29, 2018

strega-nil commented Nov 29, 2018

RalfJung commented Nov 29, 2018

asajeffrey commented Nov 29, 2018

asajeffrey commented Nov 30, 2018

RalfJung commented Nov 30, 2018 • edited Loading

asajeffrey commented Nov 30, 2018

RalfJung commented Dec 2, 2018

asajeffrey commented Dec 13, 2018

asajeffrey commented Dec 13, 2018

gnzlbg commented Dec 13, 2018

RalfJung commented Dec 13, 2018

asajeffrey commented Dec 13, 2018

asajeffrey commented Dec 13, 2018

RalfJung commented Dec 14, 2018

RalfJung commented Dec 14, 2018

asajeffrey commented Dec 14, 2018

strega-nil commented Dec 14, 2018

asajeffrey commented Dec 15, 2018

strega-nil commented Dec 15, 2018

RalfJung commented Dec 15, 2018 • edited Loading

RalfJung commented Dec 15, 2018 • edited Loading

RalfJung commented Dec 15, 2018

asajeffrey commented Dec 17, 2018

RalfJung commented Dec 18, 2018

hanna-kruppe commented Dec 18, 2018 • edited Loading

RalfJung commented Dec 18, 2018 • edited Loading

hanna-kruppe commented Dec 18, 2018

asajeffrey commented Dec 18, 2018

asajeffrey commented Dec 18, 2018

RalfJung commented Feb 28, 2019

asajeffrey commented Feb 28, 2019

RalfJung commented Nov 29, 2018 •

edited

Loading

RalfJung commented Nov 30, 2018 •

edited

Loading

RalfJung commented Dec 15, 2018 •

edited

Loading

RalfJung commented Dec 15, 2018 •

edited

Loading

hanna-kruppe commented Dec 18, 2018 •

edited

Loading

RalfJung commented Dec 18, 2018 •

edited

Loading