- Feature Name:
gradual_struct_init
- Start Date: 2018-10-24
- RFC PR: _
- Rust Issue: _
Permit the gradual initialization of structs like so:
struct Point<T> { x: T, y: T }
let pt: Point<_>;
pt.x = 42;
pt.y = 24;
drop(pt);
Additionally, initialized fields of a struct may be referenced by other fields rendering the struct immovable. For example:
struct Foo<'a> { a: u8, b: &'a u8 }
let f;
f.a = 42;
f.b = &f.a;
// drop(f);
// ------- not allowed since this moves `f` into `drop`.
The RFC has two main motivations. TODO: improve section prelude.
TODO: Felix and/or Niko: please fill in this section :)
Currently, if you want to encode an immovable self-referential type,
you'll have to use Cell
s and Option
s such as with:
#[derive(Debug)]
struct S<'a> {
val: u32,
ptr: Cell<Option<&'a u32>>,
}
fn main() {
let s = S { val: 10, ptr: Cell::new(None), };
s.ptr.set(Some(&s.val));
println!("s: {:?}", s);
}
With gradual initialization, we can instead allow you to write:
#[derive(Debug)]
struct S<'a> {
val: u32,
ptr: &'a u32,
}
fn main() {
let s: S;
s.val = 10;
s.ptr = &s.val;
println!("s: {:?}", s);
}
You still cannot move s
because s.val
is borrowed in s.ptr
;
However, the advantage here is that we avoid the indirection of Option<T>
.
Let's first get some definitions out of the way; if you are familiar with the Rust language, you can skip this section.
-
A local binding or variable is a name for a component on the stack which holds a value. An example (1):
let foo = 1;
A binding may be immutable, which
foo
is, or mutable, whichbar
is (2):let mut foo = 1;
-
An uninhabited type is a type which cannot be constructed since it has no values that inhabit the type. An example of such a type is:
enum Void {}
-
Initialization is the process of assigning a value to a binding or more generally place for the first time. Examples are in (1) and (2).
-
Reinitialization is the process of assigning a value to a place that has already been initialized but then moved out of. Immutable bindings may not be reinitialized.
-
A tuple type is denoted as
(T0, ..., Tn)
, whereTi
are types. Tuples are structurally typed homogeneous product types. -
A struct type, declared as
struct Name { field: T0, ... }
is a nominally typed homogeneous product type. Structs consist of a set of fields with typesTi
. Given a placex
of typeName
it is possible to project out the value of a field withx.field
.Each field of a struct type may optionally be assigned a visibility. If no visibility is specified, it is assumed to be private by default. Visibility is context dependent; a field may be visible in some context but not in some other.
A struct type can also be of the tuple variety and is then declared with the syntax
struct Name(T0, ..., Tn);
. In that case, the field names are positional indices (i ∈ 0 ... n
).A struct type may also be of the unit variety, e.g.
struct Name;
in which case it has no fields.
Suppose that you have a struct
(1):
struct Foo<T> {
bar: usize,
qux: T,
}
Suppose also that you have a function that consumes a Foo<T>
(2):
fn consume<T>(_foo: Foo<T>) {}
Currently, if you want to make a Foo
then you have to write (3):
let foo = Foo {
bar: 42usize,
qux: 24u8,
};
We propose that you should be able to also write (4):
let foo;
foo.bar = 42usize;
foo.qux = 24u8;
consume(foo); // OK!
Note that in snippet (4), you are not mutating foo
, foo.bar
, or foo.qux
.
These fields are only being initialized. If you however assign to foo.bar
twice as in (5):
let foo: Foo<u8>;
foo.bar = 42usize;
foo.qux = 24u8;
// Error! Second time you assign to `foo.bar` but `foo` is not a mutable binding!
foo.bar = 43;
drop(foo);
then the compiler will reject the program as ill-formed because
foo
was not declared as a mutable binding with mut foo
.
To remedy this, you may write (6):
let mut foo: Foo<u8>;
foo.bar = 42usize;
foo.qux = 24u8;
foo.bar = 43; // OK!
drop(foo);
Note that in both (4) and (6), you can only drop(foo)
, consume(foo)
,
or take references to foo
because you have fully initialized foo
.
Suppose instead that you wrote (7):
let mut foo;
foo.qux = 42;
consume(foo); // Error! `foo.bar` is not initialized.
The snippet in (7) would be rejected by the compiler because
otherwise foo.bar
would be uninitialized and thus you would not
have a valid Foo
, thus causing the type system to be unsound and
the program to exhibit undefined behaviour.
In our proposal, it is also not legal to take a reference to foo
, as with
&mut foo
or &foo
while parts of it isn't initialized. You also cannot
copy foo
before it is fully initialized. Neither may you take references to,
move, or copy the parts of foo
(in this case foo.bar
) that are not yet
initialized. This extends to conditional initialization. The compiler will
allow you to write (8):
let foo;
foo.bar = 42;
if random_bool() {
foo.qux = 1;
} else {
foo.qux = 2;
}
// Nota bene:
// In this case foo.qux = if random_bool() { 1 } else { 2 };
// is preferable as a matter of style.
consume(foo); // OK!
The snippet in (8) is allowed because when consume(foo)
is reached, foo.qux
is provably initialized no matter what branch is taken in the conditional.
However, the following snippet would be rejected because foo.qux
may be
uninitialized in the else
branch (9):
let foo;
foo.bar = 42;
if random_bool() {
foo.qux = 1;
} else {
// nothing here.
}
// foo.qux = 2; -- adding this line would not help because now `foo.qux` may
// be initialized twice and `foo` is not a `mut` binding.
consume(foo); // ERROR! `foo.qux` not initialized in `else { .. }`.
The general condition here is that all fields must be definitely initialized using the same logic for determining definite initialization as for local fields.
We previously noted that you may not move or reference parts of foo
that are
not yet initialized. The converse also applies. You may reference, move,
or copy the parts that are initialized while the whole type isn't. (10):
let mut foo: Foo<u8>;
foo.qux = 1;
drop(foo.qux);
foo.bar = 2;
{
let my_bar: &usize = &foo.bar; // OK!
} // <- Shared borrow ends here.
{
let my_mut_bar: &mut usize = &mut foo.bar; // OK!
} // <- Mutable borrow ends here.
foo.qux = 1;
Note in particular that foo.qux
is not initialized when we refer to foo.bar
.
As long as you don't refer to foo.qux
and foo
while it is uninitialized this
is ok. You must also make sure that foo
is definitively initialized at some
point even if you move out of some fields before foo
goes out of scope.
Furthermore, it is not sufficient to simply initialize parts of foo
at
various times; all of foo
must be initialized at the same point.
These restriction apply to make adding fields cause type errors so that
you can refactor more easily.
Because you are now able to construct Foo<T>
piecemeal, this enables you
to reference parts of Foo<T>
that have already been initialized in other
parts of Foo<T>
. For example, you may write (11):
let foo: Foo<&usize>;
foo.bar = 42; // <--
// |
foo.qux = &foo.bar; // --/ The compiler will ensure that this is dropped first
// so that there are no dangling references.
// We can take a reference to `&foo`, that doesn't move `foo` anywhere:
drop(&foo);
So far so good. However, if you add a line at the end and try to move foo
itself, you will run into trouble (12):
// Error! We can't move `foo` because `foo.qux` borrows `foo.bar`.
consume(foo);
With the addition of (12), an error arises because if it were otherwise,
foo.qux
would point to the old address of foo.bar
which would now be
invalid. This would be unsound due to the dangling reference.
Suppose that you have consume
d foo
as in (4).
In such a case, assuming Foo<T>
is not Copy
,
the binding foo
becomes uninitialized because we have moved out of foo
.
Before this RFC, you had to reinitialize foo
with foo = Foo { ... }
.
We propose that the gradual method should work for reinitialization as well (13):
let mut foo;
foo.bar = 42usize;
foo.qux = 24u8;
consume(foo); // OK!
foo.bar = 1;
foo.qux = 2;
consume(foo); // OK!
Thus far, only Foo<u8>
and Foo<&usize>
have been used as the type of foo
,
which means T
in Foo<T>
has always been inhabited.
Suppose instead that you defined an uninhabited type (14):
enum Void {}
Let's then write (15):
let foo: Foo<Void>;
foo.bar = 42;
foo.qux = ???;
consume(foo);
What do you replace ???
with? The field qux
is of the uninhabited type Void
.
Thus, there can be nothing you initialize foo.qux
with or otherwise it
wouldn't be of an uninhabited type. Indeed, you cannot initialize foo.qux
and so you will never be able to form a valid value of type Foo<Void>
.
This is to be expected, after all, you can't use the syntax
Foo { bar: 42, qux: <value> }
for this either because you cannot produce a
<value>
of type Void
.
Until now, all examples have been in contexts where the fields bar
and qux
have been visible. In a case where the fields are not visible, you won't be able
to construct a value of Foo<T>
with Foo { bar: x, qux: y }
. The gradual
method is no different; to initialize the fields with the gradual method,
you must be able to refer to the fields. If you are not, then it follows that
you won't be able to construct a Foo<T>
. In other words, the following would
not be a valid Rust program (16):
mod my_module {
pub struct Point {
x: u8,
y: u8
}
}
fn main() {
let pt: Point;
pt.x = 1; // Error! pt.x is not visible in this context.
pt.y = 2; // Error! pt.y isn't visible either.
drop(pt);
}
Per RFC 2008, the type system will also respect #[non_exhaustive]
annotations with respect to gradual initialization. That is, if you mark a
struct
with #[non_exhaustive]
in one crate, then a downstream crate won't
be able to use the gradual syntax to make a value of the type.
Consider a type which implements Drop
by printing something (17):
struct SpecialDrop {
alpha: u8,
beta: u16,
gamm: u32,
}
impl Drop for SpecialDrop {
fn drop(&mut self) {
println!("Dropping in a special way!");
}
}
Let's then use the gradual initialization mechanism in this RFC (18):
fn foo() {
let sd: SpecialDrop;
sd.alpha = 1;
sd.beta = 2;
some_action_that_can_panic();
sd.gamma = 3;
}
If you don't have the definition of SpecialDrop
in your near vicinity,
it can become difficult to know whether the destructor, and thus the side effect
in SpecialDrop::drop
will run or when it will run. In (18), to know if and
when the destructor will run, you would need to know that SpecialDrop
has the
fields alpha
, beta
, and gamma
and that gamma
hasn't been initialized yet.
Thus, while it would be sound to allow it, if a type implements Drop
,
the compiler will reject gradual initialization.
Hitherto, we have only gradually initialized struct
s with named fields.
However, this RFC makes no distinction between those and other forms of
heterogeneous product types. In other words, you can also initialize
tuple structs and normal tuples with the gradualist syntax.
That is, you can write (19):
struct Point(f32, f32);
let pt: Point;
pt.0 = 42.24;
pt.1 = 13.37;
drop(pt);
as well as (20):
fn consume_pair((x, y): (u8, u8)) { ... }
let pt;
pt.0 = 42;
pt.1 = 24;
consume_pair(pt);
What we have said so far does not just apply to local bindings. You can use the gradualist mechanism in fields as well. For example, you may use the gradual notation as a sort of builder notation (21):
struct Config {
window: WindowConfig,
runtime: RuntimeConfig,
}
struct WindowConfig {
height: usize,
width: usize,
}
struct RuntimeConfig {
threads: usize,
max_memory: usize,
}
fn make_config() -> Config
let cfg;
cfg.window.width = 1920;
cfg.window.height = 1080;
// cfg.window is definitely initialized now.
cfg.runtime.threads = 8;
cfg.runtime.max_memory = 1024;
// cfg.runtime is definitely initialized now.
// therefore cfg is also.
cfg
}
The usual same rules with respect to eventual definite initialization,
privacy, uninhabited types, prohibitions on Drop
types, etc. apply
here as well.
There are no changes to the grammar.
Let:
-
n
range over the natural numbers. -
Γ
be a well-formed typing environment. -
σ
be a well-formed type inΓ
. -
typeof(e)
denote the typeσ
of an expressione
inΓ
. -
downstream(σ)
be a predicate holding iffσ
is not defined in the current crate. -
non_exhaustive(σ)
be a predicate holding iff:σ
has#[non_exhaustive]
directly applied to it,downstream(σ)
holds.
-
visible(σ, f)
be a predicate holding iff forσ
, the fieldf
is visible inΓ
. -
implemented(σ, τ)
be a predicate holding iffσ
implements the traitτ
. -
D
range over valid identifiers. -
G
denote the generic parameters on a data type definition. -
P
be all the product types inσ
, namely:P : struct D ; // Unit data types. | struct D < G > ( σ_0, ..., σ_n ); // Tuple struct data types. | struct D < G > { f_0: σ_0, ..., f_n: σ_n } // Named field struct data types. | ( σ_0, ..., σ_n ) // Structural tuple types. ;
(If structural records were ever to be added they would also be included.)
-
PG = { σ ∈ P | implemented(P, Drop) ≡ ⊥ ∧ non_exhaustive(σ) ≡ ⊥ }
-
p
denote a place expression. -
m(p)
be a predicate holding iffp
is a mutable place.
Given a place p
, iff typeof(p) ∈ PG
, then p
may be gradually initialized.
This means that:
-
an uninitialized field
f
ofp
, whetherf
be a numbered tuple index or a named field, may be initialized with an assignment expressionp.f = value_expr
, ifvisible(typeof(p), f)
. -
iff
m(p)
, thenp.f
may be assigned to more than once. Otherwise, it may only be assigned to once in any path in the control graph. -
a field or sub-field
f
ofp
may be moved out of (unlessimplemented(typeof(p), Drop)
or ifp
orp.f
is borrowed) or borrowed (including mutably iffm(p)
) ifp.f
is definitely initialized. -
the place
p
is valid, meaning that it may be moved or referenced, once all of its fields have been definitely initialized. -
before
p
goes out of scope,p
must have been definitively initialized at some point at least once.
In all cases, definitive initialization of fields are checked in the same manner as for local fields prior to this RFC.
The usual drawbacks with respect to having more ways to do it applies, in particular, it could lead to decision fatigue between the struct literal approach and the gradualist approach.
Another drawback is that we don't provide an similar way for enums to be gradually constructed.
Furthermore, the gradual approach could be considered more imperative than functional which some functional programmers may not prefer and this would increase the likelihood of that in the ecosystem. However, the author of this proposal is a functional programmer (of the Haskell variety) and they do like what they are proposing.
Yet another drawback is that it could become harder to see when a value is constructed.
Perhaps the most major drawback in this proposal is the complexity increase
in the type system. This will make alternative (to rustc
) Rust compilers
more difficult to write.
With respect to the current design, the choice of surface syntax is quite straightforward. We simply extend the assignment syntax that already exists.
However, some tweaks in various directions could be made to the RFC:
-
We could limit gradual initialization to top level bindings. This would limit the power and is seemingly an arbitrary restriction that could cause surprises.
-
We could require that a gradually initialized place be definitively initialized before allowing references or moving out of parts. This would mean that we could not write:
let foo: Foo; foo.bar = 1; foo.baz = &foo.bar;
That would remove one of the uniquely useful things about the proposed syntax.
-
We could limit this to nominal types or just to types with named fields. While that is possible, it complicates understanding of the type system.
-
We could allow
Drop
types to be gradually initialized. This is discussed in the unresolved questions.
None to our knowledge.
-
Should we lift the requirement that each field must be definitively initialized at least once before a gradually initialized place goes out of scope?
We have imposed this restriction to make sure that adding a new field causes a type error in all cases; if we lifted the restriction, that would no longer hold.
-
Should we allow gradual initialization for types implementing
Drop
?While this might make the language more consistent and uniform, it might also become too much magic. In particular, it becomes less clear when destructors will run and understanding when it happens might require tracking "more boolean variables" in your head. For example, if we write:
let foo; foo.bar = <expr>; foo.baz = <expr>; foo.quux = <expr>; may_panic(); // a few lines later... foo.wibble = <expr>;
Then we have to know that
foo
has 4 fields to know thatmay_panic()
will not run the destructor oftypeof(foo)
.It is however unclear if this would be a problem in practice and it could be solved by privacy.
If we do however allow
Drop
types to be gradually initialized, then we should also lift the restriction on moving out ofDrop
types to make things consistent. If that restriction is not removed, it would be to not allow gradual initialization ofDrop
types as well for consistency.If we want to allow
Drop
types, then the restriction in 1. is helpful because there's less to track in your head. -
If we permit structural records, and if the user writes:
let foo; foo.bar = 1; foo.baz = true; foo.qux = "hello"; drop(foo);
Should then
foo
, if not otherwise constrained, be inferred to the type?:{ bar: i32, baz: bool, qux: &'static str }
This could be quite ergonomic, but it could also be too much magic.
Partially initialized objects are not exposed to the user in the type system.
We could allow this with &uninit T
references. That would allow uninitialized
fields to pass into functions which can then initialize them.
See RFC 2534 for a discussion on &uninit T
.
A further extension to this RFC could be to permit the gradual initialization of monomorphically sized arrays by using assignment to constant indices. For example, we could allow:
let arr: [i32; 3];
arr[0] = 0;
arr[1] = -12;
arr[2] = 42;
drop(arr);
Here, the indices 0
, 1
, and 2
are all known at compile time and the
size of the array is also known. Thus, the compiler can know whether the
array has been definitively initialized and enforce that it is before the
array is used. The usual rules with respect to gradual initialization as
outlined in this RFC would otherwise apply.
Similar to the type-defaulting idea for structural records,
we could also assume that arr
is an array if the user writes:
let arr;
arr[0] = 0;
arr[1] = -12;
arr[2] = 42;
drop(arr);
Since the idea of gradual array initialization is orthogonal and
because it overloads the meaning of the place[idx] = <val_expr>
,
we leave this for future consideration and do not propose such a
facility at this time.