Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validity invariant for types #8

Closed
nikomatsakis opened this issue Aug 30, 2018 · 13 comments
Closed

validity invariant for types #8

nikomatsakis opened this issue Aug 30, 2018 · 13 comments
Labels
A-validity Topic: Related to validity invariants

Comments

@nikomatsakis
Copy link
Contributor

@RalfJung introduced the idea of validity invariants in their blog post "Two kinds of invariants". Presuming we agree with this framing (I do), we need to define these validity invariants.

These invariants need to justify the various sorts of optimizations that we currently do:

  • For example, Option<&T> layout optimization
  • Marking pointers as deferenceable

We need to discuss also the role of uninitialized memory and how it fits in. For example, when are invariants required to hold? Only when "compiler thinks memory is initialized" -- can/should we make that more precise? Also, what about loads of uninitialized integral values (a sometimes useful hack) -- are those valid? What is the effect?

@RalfJung
Copy link
Member

RalfJung commented Oct 9, 2018

Noting it here so we don't forget about it:

  • We will likely say that references have to be aligned to be valid; what does "aligned" mean for an extern type where we don't know the alignment?
  • For fat raw pointers, do we require the metadata to make enough sense to determine size and alignment?

@RalfJung
Copy link
Member

As but two of many examples for people using &mut to uninitialized data:

(Collecting this here so I will find the links when the discussion starts.)

@RalfJung
Copy link
Member

An example of uninitialized raw pointers and integers: https://github.com/carllerche/bytes/blob/456221d16521cf54cea0e6569669e47120a1b738/src/bytes.rs#L1810

@RalfJung
Copy link
Member

Related discussion about which invariants we require to hold on &*x: rust-lang/rust#56190

@RalfJung
Copy link
Member

str is an interesting case: It usually only exists behind a ptr indirection, so an &str to non-UTF8 data would not be UB if references do not have to be transitively valid.

Maybe the UTF-8 thing is just a safety invariant, never a validity invariant?

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 26, 2018

@RalfJung I think that, ideally, for DSTs, validity should just be specified for custom DSTs. That is, the validity of trait objects, slices, and str, should follow from that.

We don't have custom DSTs yet, so I think it is ok to also consider validity of trait objects and slices, and in the future, when custom DSTs are available, try to reconcile those into a more general framework.

Currently, str is an special DST in Rust that's part of the language, but I think it might be worth it to consider it as a custom DST that's just defined as struct str([u8]) so that its validity would follow from the validity of slices. This is how I expect AsciiStr to be implemented as well once we have custom DSTs, and I think it would be weird for str to be more special than it. Also, CStr would be a custom DST as well.

Thinking more into the future, if str were to be implemented in core as a custom DST (@rkruppe discovered: rust-lang/rust#19612), we would still need to make it a lang item, which would still enable the compiler to do specific str optimizations. I don't know if the compiler is already doing some, and it is unclear to me how much value these add. But the requirement that "constructing a &str to a non-UTF8 string is undefined behavior" appears to me to be a requirement that, while checkable on miri, I don't think is checkable in general. For example, with custom DSTs I could have a slice containing only prime numbers: &PrimeNumberSlice. If constructing one that points to non-prime numbers is undefined behavior, then miri would have to detect it at construction point "somehow".

So I like to think that the UTF8 requirement on &str is more like an user-defined pre-condition on some of the &str methods. For example, str::len(&self) returns the length of the &str in bytes, so this method does not necessarily have to require the contents of str to be valid UTF-8 to work "correctly". Creating a str that does not point to an UTF8 string, and then calling some methods whose pre-conditions are violated, could lead to undefined behavior down the road (e.g. a null pointer dereference, read out-of-bounds, etc.) but I don't think it should be a new special type of undefined behavior per se.

@RalfJung
Copy link
Member

For example, with custom DSTs I could have a slice containing only prime numbers: &PrimeNumberSlice. If constructing one that points to non-prime numbers is undefined behavior, then miri would have to detect it at construction point "somehow".

Sure, custom DSTs come with custom invariants that rustc and miri do not know. These are safety invariants, not validity invariants.

What you are saying is that we could make UTF-8 a mere safety invariant for str. I agree. The open question is whether that reflects reality, and what it is that we want to do.

@RalfJung
Copy link
Member

RalfJung commented Dec 3, 2018

ManuallyDrop is an interesting case. It certainly has to be at least "bitstring-valid" (using @asajeffrey's term), does it have to be anything more? Maybe we should, after all, separate the things references say about their pointee from the rest of validity?

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 3, 2018 via email

@RalfJung
Copy link
Member

RalfJung commented Dec 3, 2018

We want (and have) ManuallyDrop<T: ?Sized>, while we are unlikely to get unsized unions any time soon.

@RalfJung
Copy link
Member

RalfJung commented Dec 3, 2018

Basically, ManuallyDrop must still be valid because we must still be able to determine its size.

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 3, 2018 via email

@RalfJung RalfJung added the A-validity Topic: Related to validity invariants label Jan 10, 2019
@JakobDegen
Copy link
Contributor

Closing. There are lots of open questions here, almost all of which are in other issues. This one doesn't seem useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-validity Topic: Related to validity invariants
Projects
None yet
Development

No branches or pull requests

4 participants