Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Uninitialized Pointers #98

Closed
wants to merge 1 commit into from

Conversation

gereeter
Copy link

@gereeter gereeter commented Jun 1, 2014

No description provided.

@dobkeratops
Copy link

I can see the use for this sort of thing; IMO a similar "&overwrite" pointer would be interesting to further clarify 'input'/'output' information for a function. By default function arguments are inputs; then you annotate "mut" for something changeable - which is both an input and an output - but an "overwrite" pointer is purely an output, which completes the range of cases.

Some CPU architectures have the potential for optimisation with 'cache-line allocate-zero' instructions, if you know ahead of time that some memory will be completely over-written you can avoid reading it into the CPU before its modified.. reducing memory bandwidth requirement;

but even without that, I think its nice to be able to communicate more about what a function does in its signature

(my idea would be better named "&overwrite" since you're saying what you will do with it, not what its' existing state is - as you indicate you would be able to create an "&uninit" from an existing "&mut", even though that may be initialised - but a very common use would be from data that is uninitialised,
.... would it be correct for 'malloc' to return an &overwrite pointer?)

@Valloric
Copy link

Valloric commented Jun 1, 2014

IMO this doesn't seem worth the extra complexity of yet another pointer type. Everyone will have to learn what this does but it will be used very rarely. Rust is already getting the reputation (fairly or unfairly, doesn't really matter) of "that language with too many different pointer types," let's not make that worse unless strictly necessary.

@dobkeratops
Copy link

I wouldn't worry about that - there's a difference between necessary and accidental complexity: without a garbage collector to lean on, its inevitable that its going to have that complexity shifted to the user.

If you've reasoned fully about all the possibilities for a pointer - then you already know this case, we'd just be assigning a name to it and potential for the compiler to catch some more errors/more potential to communicate through types.

I suppose it's possible that RVO catches some of the need for an overwrite type? .. but for those of us who've come from C, the ability to state something explicitly can be comforting.
My perspective might be different, having come from C and asm - I personally think its better to learn exactly how pointers and allocations work in C, before learning the shortcuts and abstractions C++ (or Rust) gives you- whereas i know many others would say the exact opposite, that C would poison your mind.

It would be a rarely used case, so I wouldn't mind it having a long,verbose, name..(not clashing with existing vars..) you won't be using it often. But if as the OP claims it can reduce the amount of unsafe blocks -then that makes the case more compelling beyond simple 'communication' as I thought it would be.

@Valloric
Copy link

Valloric commented Jun 1, 2014

If you've reasoned fully about all the possibilities for a pointer - then you already know this case, we'd just be assigning a name to it and potential for the compiler to catch some more errors/more potential to communicate through types.

I'm not saying this doesn't address a use-case; it appears that it does. But that same use-case is already being addressed with unsafe blocks. That's an unfortunate solution, granted, but it might be worth sticking to that since the extra complexity might not be worth it and the use-case is rare.

BTW this issue doesn't seem like a 1.0 blocker, so could be revisited in the future if more experience with Rust shows that this is necessary. I'd just rather not go through the 1.0 gate and then teach people Rust with "and here are the 11 different pointer types; ignore 8 of them for now."

@dobkeratops
Copy link

Fair enough; I would certainly agree I there are way more important feature requests, and it could come later;

"teach people Rust with "and here are the 11 different pointer types; ignore 8 of them for now."

You could say its an annotation of an existing type - and look whats happened with @.
Did they reduce the amount of pointer types?
no.. they increased it, by adding Rc, Gc, (and probably more), by generalising the mechanism to create more.

The truth is , pointers are just complex :) I think it is valid to cover more and just sort by importance when you teach. And the language should sort by frequency when naming/selecting defaults.

The way I see it , a reference has 3 potential 'bits' of information... aliasiable, writable, readable. So the default is 'readable, aliasable)', then &mut (opposite to C's const) imposes 'writeable, non aliasable'; the whole business of 'Cell' could be addressed by adding an '&alias' (opposite to C's restrict)? - and yet another keyword to disable 'readable' goes beyond C in expressiveness and safety for low level code - whats returned by 'malloc' is not safely readable, but the type system doesn't tell you.

@dobkeratops
Copy link

Would more demand for this appear with the potential box() arguments ? (implementing emplace_back and so on..) - would that be a situation where you'll want to pass an 'overwrite' or 'uninit' pointer?

@pczarn
Copy link

pczarn commented Jun 1, 2014

Mostly a nice feature although it's not a necessity. However, I don't like implicit type states.

Destructuring an uninit pointer should be done in a way analogous to ref mut. (even though the & pattern currently matches both &mut/&)

let &uninit(ref uninit a, ref uninit b) = ptr;

@glaebhoerl
Copy link
Contributor

I would call this &out, because (a) intuitive, (b) C# precedent, (c) short. Some earlier musings about this idea here.

@gereeter
Copy link
Author

gereeter commented Jun 1, 2014

I can see the use for this sort of thing; IMO a similar "&overwrite" pointer would be interesting to further clarify 'input'/'output' information for a function. By default function arguments are inputs; then you annotate "mut" for something changeable - which is both an input and an output - but an "overwrite" pointer is purely an output, which completes the range of cases.

I was deliberately trying to avoid this connotation, as I really don't like output pointers. However, as @glaebhoerl's placement new formulation shows, this does have a use case, and so I may weaken on this point.

you would be able to create an "&uninit" from an existing "&mut", even though that may be initialised

While you do create &uninit pointers from existing &mut pointers, you do it by moving the data out, which means that you have no guarentee as to what is left behind. As such, I'd say that the resultant data is uninitialized. However, in practice, the move is done with a copy, so you do have a point.

IMO this doesn't seem worth the extra complexity of yet another pointer type.

This is, unfortunately, a valid point. Although I was more worried about the drop flag, I was hesitant to post this RFC because it has very clear downsides.

I suppose it's possible that RVO catches some of the need for an overwrite type?

As mentioned before, this is why I was explicitly avoiding any examples of using this pointer as a place to output values.

BTW this issue doesn't seem like a 1.0 blocker

This is definitely not a 1.0 blocker. It is completely backwards compatible (unless we get rid of the drop flag) and could be added at any time.

Would more demand for this appear with the potential box() arguments ?

This type of pointer could subsume the placement new optimization for pointers and add it to a number of other data structures. However, because of this, box already takes care of a large use case of &uninit. That reasoning was one of the reasons I didn't talk about placement new in this RFC.

However, I don't like implicit type states.

To some extent, I agree, and I mentioned this in the drawbacks section. However, this can be fixed with explicit movement functions, as discussed in the issue @glaebhoerl mentioned. I'll add this to the alternatives section.

Destructuring an uninit pointer should be done in a way analogous to ref mut.

This sounds like the logical approach, though it might confuse people as to why they can't only partially destructure the value.

I would call this &out

As mentioned above, I wanted to avoid the connection with output parameters because that would be two different ways to achieve the same thing. However, your formulation of placement new requires it, so I might change my mind. Regardless, I'll add the possibility to my RFC.

Some earlier musings about this idea here.

Very interesting - I hadn't seen that discussion before. The point about generic functions is quite worrying, and I'll make sure to add it in. I also like your formulation of placement new - I hadn't figured out a way to return something and a borrowed version of it simultaneously. I think that this cannot be used for tying the knot, but I do think it can do arbitrary permutations, as you mentioned.

drop(*ptr);

// This drops the whole vector, but the pointer to 2 is not freed twice because it was zeroed when
// ptr was encountered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References don't get freed. If by "pointer to 2" you mean the box in the vector, we can't rely on zeroing to work, because we'd like to get rid of it and move to precisely tracked destructors instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, I do mean the Box in the vector.
  2. I mentioned the reliance on a drop flag in the drawbacks section, and this is the most compelling drawback that I see.

@glaebhoerl
Copy link
Contributor

Some thoughts, partly borrowed from the earlier thread:

There is a use case for this feature, right in Rust's target audience in fact. In low level / embedded code it's a common technique for functions not to allocate memory for their own results, but to return the result into a pointer provided by the caller, which could be to memory allocated by the caller if necessary, but also to the caller's stack, into an existing structure, etc. This feature would allow the pattern to be expressed directly and proved safe by the compiler, which is not currently possible. The current alternative is contortion inversion of control with closures.

I agree with @pczarn about destructuring (except of course I would call it ref out).

fn foo() -> T is equivalent to fn foo(ret: &out T). Because of this, for a long time I didn't think &out added any expressiveness. But it does. The added expressiveness comes from being able to tie lifetimes together. I was out of the loop for a couple of months so I don't know what the current formulation of it is, but for instance, the trait for box could be written as trait Box<T> { fn alloc<'s>(&'s out self) -> &'s out T; }.

I don't think we should, or need to, have variables change their types for this to work. Linear types already allow us to express everything that we need within the current system. If you want to turn an &mut into an &out, just provide a function which takes &mut as argument and returns &out. Analogously for the other cases.

You write "[it] points to possibly uninitialized data", but in fact it must be uninitialized data. If it points to initialized data and you overwrite it through the pointer, its destructor should be run, but isn't. In the types-are-propositions sense an &out pointer is evidence that the object it points has been deinitialized (or never was initialized), and carries the obligation to initialize it. In fact you should be able to take an &out pointer to an initialized object, with the effect that it gets deinitialized (its destructor runs).

EDIT: Now I remember that you were relying on drop flags. In that case read this as how to avoid drop flags. Getting rid of drop flags, and tracking moves statically, is something I believe we're planning to do anyways. Every variable is always either definitely initialized or definitely uninitialized, and the compiler (the borrow checker) knows which.

The issue with fail!() and generic functions is trickier than I first gave it credit for. My original idea was that (a) when the compiler knows about an &out, and sees it go out of scope without being written to, it would issue a compile error; and (b) to deal with the problem of passing an &out to a generic function, where the compiler can no longer see it, &out would have a destructor which invokes fail!(), thereby enforcing the invariant that the program can't ever be in a state where an &out should have been written to, but wasn't, dynamically instead of statically.

For the case I was originally concerned about, viz. what happens if you have an &out in scope and something else fail!()s, this works fine: the &out is itself dropped, its destructor runs and fail!()s, and double failure is program abort, so there's no opportunity to observe the violation of any invariants. The harder case is actually when the &out itself causes the failure. Say you do this: let out_ptr = my_vec.emplace_back(); drop(out_ptr). Failure happens inside drop(), the destructor for my_vec is run, and it attempts to deinitialize the elements of the array. But the last element is still uninitialized. Oops. This couldn't work safely without using some kind of dynamic drop flag. It's the same with ~/Box and other things.

So here's the new plan:

  • As before, an &out going out of scope without being written to is a compile error.
  • Prohibit passing &out to generic functions straight-up. The way this would work is that every type parameter of every generic function is considered to have an implicit Drop bound, and &out would not be considered to implement Drop, nor would Drop be "automatically derived" for types which contain &out. Therefore you couldn't instantiate any type parameter of a function with &out or with a type which contains &out. You also wouldn't be allowed to hide an &out in an existential type. (The constructor of an existential type is itself universally quantified, so this is actually just a specific case of the previous point.) Therefore the problematic case from above becomes impossible. This means you wouldn't be able to swap() two &out pointers, nor work generically with e.g. containers of &out or whatever. This might be annoying, but the alternative is requiring explicit Drop bounds on all generic functions which want to drop values of the given type, or (transitively) call other functions which do, which would be much more annoying.
  • The remaining case is the possibility of something else fail!()ing while an &out is in scope, which, as in the original plan, could continue to be handled as a double failure, or just a program abort directly (given that failure is now the only case where the destructor for an &out could ever be invoked). Not the prettiest thing in the world, but it preserves all invariants, and for the awesomeness that is &out, I think it's a small price worth paying.

LATE EDIT: For what it's worth, we could do without the static restriction in the second point above and just make the destructor for &out be program abort in all circumstances.

@glaebhoerl
Copy link
Contributor

Here's another really cool thing with &out references:

Array slices.

The problem I was thinking about was, how could Rust provide analogues to some of the classic list-based functions in Haskell's Prelude, such as iterate and replicate, except using arrays instead? For reference:

replicate :: Int -> a -> [a]
iterate :: (a -> a) -> a -> [a]

We obviously can't make infinite arrays in the case of iterate, so we would have to specify the number of iterations explicitly in that case as well. I'll use replicate as the example from here because the two are very similar. If we had integer generics parameters and specified the size statically, it would be straightforward to express using fixed-length arrays:

fn replicate<T: Clone, static N: uint>(elem: &T) -> [T, ..N];

(Or at least, the type signature. Not necessarily the implementation.)

But if the size is determined at runtime:

fn replicate<T: Clone>(elem: &T, num: uint) -> ???

What can we return? We could return a Vec, but can we avoid allocating? We clearly can't return an unboxed array of dynamic size.

But we could do this:

fn replicate<T: Clone>(elem: &T, into: &out [T]);

Here into is, like other array slices, a fat pointer, containing a pointer to the array along with its length. So the number of elements is specified implicitly by the size of the slice we pass in, rather than as a separate parameter. Being an &out pointer, it points to uninitialized memory, and can only, and must be, written to, not read from. Here's how it could be implemented:

fn replicate<T: Clone>(elem: &T, mut into: &out [T])
{
    loop {
        into = match into {
            &out [ref out head, ..ref out tail] => {
                *head = elem.clone();
                tail
            }
            &out [] => {
                break
            }
        }
    }
}

iterate could be implemented similarly, except using a caller-provided closure and an accumulator instead of clone(). I don't believe it's possible to do this - initialize an array of dynamic size without allocating or initializing it twice, using only safe code - without &out.

Edit: While thinking about the match in the above code, another thing became clear to me: it's not just that you could use ref out in pattern matches, but the compiler must enforce that if you pattern match an &out, every subcomponent of it gets bound to a ref out, because only then can we be sure that the whole thing will get initialized. This also implies that matching an &out, or any part of it, with a wildcard _ pattern would be illegal.

@huonw
Copy link
Member

huonw commented Jun 14, 2014

@glaebhoerl that's interesting. It seems like &out pointers are legitimate linear types (rather than affine) in that they must be "used" precisely once.

(FWIW, those two functions would normally be written as iterators, especially iterate, since it is infinite.)

@ftxqxd
Copy link
Contributor

ftxqxd commented Jun 14, 2014

+1 This is a very interesting proposal, and I can see that it could be very useful, although I’m not so sure about the name—&uninit is too verbose (and hard to pronounce—‘uninininit’ :P), while &out doesn’t convey how it behaves properly. Maybe &empty, &blank, &hole or &fill (that one does sound a bit backwards, though)?

I was also wondering how this would interact with rust-lang/rust#12624, given that it (AIUI) allows moves out of &mut pointers. I don’t think it would interfere with this, since this only specifies what happens to an &mut pointer between it being moved out of and it being re-assigned to.
Edit: Actually, on second thoughts, I think this basically just specifies a way to implement those rules concerning &mut pointers.


There’s one thing I don’t understand, however—why does the state of the pointer have to be defined at all times? Surely it would be safe to simply assume that, when it’s indeterminate, it’s uninitialised? Example:

let x = &mut 3;
if condition {
    drop(*x);
}
// x is always assumed to be &uninit here
*x = 3;

This behaviour is similar to that of regular moves—if it’s indeterminate, it defaults to being moved. Example (that works today):

let x = box 3;
if condition {
    drop(x);
}
// x is moved here, so referring to it is invalid

@glaebhoerl
Copy link
Contributor

@huonw

that's interesting. It seems like &out pointers are legitimate linear types (rather than affine) in that they must be "used" precisely once.

As I pointed out in a line comment, any type with a destructor is linear :). But &out is unique in that it's linear without having a destructor.

(FWIW, those two functions would normally be written as iterators, especially iterate, since it is infinite.)

Yeah I guess that makes sense. (But you still couldn't initialize an uninitialized array of dynamic size with them in safe code without &out, which was the point, the rest was just context about my thought process.)

@alexcrichton
Copy link
Member

This is definitely a nice feature to have, especially for writing small cases of a simple swap function, as pointed out. I believe that @nikomatsakis has many thoughts on how we may be able to work this into the existing system by allowing you to move out of a &mut with certain restrictions around it.

For now, however, this is a backwards compatible change, so we're going to close this as postponed. We would like to revisit this, however, as this would definitely make fighting with the borrow checker easier in some cases.

As always, thank you for the RFC! We're all quite interested in seeing the various alternatives for have a system such as this!

@glaebhoerl glaebhoerl mentioned this pull request Aug 26, 2014
@glaebhoerl glaebhoerl mentioned this pull request Oct 25, 2014
@kennytm kennytm mentioned this pull request Mar 11, 2015
@Centril Centril added A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas A-references Proposals related to references labels Nov 26, 2018
@Centril Centril added A-machine Proposals relating to Rust's abstract machine. A-uninit &uninit related proposals & ideas labels Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-machine Proposals relating to Rust's abstract machine. A-references Proposals related to references A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas A-uninit &uninit related proposals & ideas postponed RFCs that have been postponed and may be revisited at a later time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants