Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Scoped threads, take 2 #1084

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 262 additions & 0 deletions text/0000-scoped-take-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
- Feature Name: scoped
- Start Date: 2015-04-16
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This RFC proposes an alternative to the `thread::scoped` API that does not rely
on RAII, and is therefore memory safe even if destructors can be leaked.

*This RFC was inspired by ideas from @nikomatsakis and @arielb1.*

# Motivation

The `thread::scoped` API allows parent threads to launch child threads that can
have references to *stack data* owned by the parent. This style of API makes
fork-join programming easy and efficient. It is also safe, *as long as the
parent's stack frame outlives the children threads*.

In the initial version of the API, safety was "guaranteed" through an RAII-style
guard which, upon destruction, would join (wait for) any still-living children
threads. This join guard is connected to the lifetime of any borrows by children
threads, thereby (we thought) preventing the related parent stack frames from
being popped until the joins were complete.

Unfortunately,
[it is possible to avoid the guard destructor in safe Rust](https://github.com/rust-lang/rust/issues/24292). Rust
does not guarantee that the destructor for a value that escapes will be run.

Note that this does *not* mean that RAII should be avoided in general. **RAII is
still a safe, idiomatic, and ergonomic style of programming for Rust.** It just
means that you cannot write `unsafe` code whose safety relies on RAII to run
destructors.

It may be possible to finesse this issue by locating every possible source of
destructor leakage (such as `Rc` cycles) and tying it to some kind of "`Leak`"
marker trait, which would be implemented by default for all Rust types (and
basically means "safe to leak"). Ideas along these lines have been discussed
extensively [elsewhere](https://github.com/rust-lang/rfcs/pull/1066), but they
represent longer-term language changes that may or may not work out.

This RFC, by contrast, proposes a general API that works in today's Rust, and
can:

* Guarantee execution of a piece of code prior to a scope being exited, even in
the presence of unwinding, `Rc` cycles, and so on.

* Thereby be used to recover a safe `thread::scoped` API.

While the API proposed here is slightly less ergonomic than the RAII-style
`scoped` API, **it has the benefit of much more clearly marking the point at
which children threads will be joined, which was left more implicit in the old
API.**

# Detailed design

The proposal has two pieces: a general mechanism for "deferred computation",
and a new `thread::scoped` API that takes advantage of it.

## Deferred computation

Here's a very simple API for deferring computation:

```
// in std:

mod thread {
pub struct Scope<'a> { ... }

pub fn scope<'a, F, R>(f: F) -> R where F: FnOnce(&Scope<'a>) -> R;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, could the lifetime parameter be omitted here as it's otherwise just anonymous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would actually kick in HRTB :)


impl<'a> Scope<'a> {
pub fn defer<F>(&self, f: F) where F: FnOnce() + 'a;
}
}
```

(This is put in the `std::thread` module because (1) scopes are per-thread
concepts (2) it fits nicely with other functionality like `panicking` and
`catch_panic`.)

You call `scope` to introduce a new `Scope` value, which is passed into a
closure that is immediately invoked.

The closure can use the `defer` method to register callbacks to invoke upon
*any* exit from the callback, including unwinding. Unlike RAII guards, there is
no way to leak these callbacks, and the implementation shown below works around
[known cases](https://github.com/rust-lang/rust/issues/14875) of destructor
leakage.

This is a generally useful mechanism that can avoid the need to create custom
RAII guards for situations where you might use `try`/`finally` in other
languages. But it is also just the support needed for scoped threads.

## Scoped threads

To recover scoped threads, we extend the `Scope` type with a method for spawning
threads:

```rust
impl<'defer> Scope<'defer> {
pub fn spawn<'body, F, T>(&self, f: F) -> thread::JoinHandle<T> where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to remove this 'body lifetime parameter altogether and instead bound by 'defer below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this rules out some more flavorful cases where multiple scopes are involved -- you don't get sufficient variance for that to work. (I tried this for passing down a Scope reference and closing over data with multiple lifetimes, and it required the API as written.)

'defer: 'body, // anything borrowed by the body must outlive the Scope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the wrong way around? It is saying 'defer must outlive 'body, whereas we want 'body to outlive 'defer. (I guess @alexcrichton's point above makes this moot.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't lifetimes "backwards" in their subtyping relationship and &-refs contravariant on the lifetime? If so, then 'defer: 'body is correct as it says 'defer is a subtype of (and therefore, smaller than) 'body.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard 'defer: 'body means 'defer outlives 'body. That means it's safe to use an &'defer in place of an &'body. So the relationship is covariant the way it is expressed. This is somewhat arbitrary though, since lifetimes can't be used in isolation. The only subtyping relation that can really exist for them is in the variance of types using them, so it made sense to make their "subtyping" covariant with how they're typically used.

Edit: As far as I can tell, this means that the bound should be the other way around, as @huonw suggests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you're right. I was going off of my recollection that internally, subtyping of lifetimes actually means that if 'a <: 'b then 'b outlives 'a and that &-references are therefore contravariant on lifetimes, but a re-read of the RFC #192 shows that (whether or not the internals still behave as I described) the 'a: 'b syntax means 'a outlives 'b.

F: FnOnce() -> T + Send + 'body,
T: Send + 'body;
}
```

Like the original `thread::scoped` API, this allows for the child thread's
closure to borrow data (with lifetime `'body`) from the parent thread. These
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "with lifetime 'body" is no longer true.

borrows are bounded by the lifetime of the `Scope` value, and the implementation
uses the `defer` method to add a callback that will join on (wait for completion
of) the child thread on exit from the scope -- thus restoring memory safety.

Note that, while previously one might return `JoinGuard`s outward to expand the
scope of joining, the pattern here is reversed: you call `scope` at the
outer-most scope, and then pass a reference to the `Scoped` value inward.

Putting it all together, here's an example from TRPL, and a version adjust to the new API:

```rust
// using the thread::scoped API
fn old_trpl_example() {
let data = Mutex::new(vec![1u32, 2, 3]);

let guards: Vec<_> = (0..2).iter().map(|_| {
thread::scoped(|| {
let mut data = data.lock().unwrap();
data[i] += 1;
})
}).collect();

// threads implicitly joined here, when `guards` goes out of scope and drops
// its contents
}
```

```rust
// using the proposed thread::scope API
fn new_trpl_example() {
let data = Mutex::new(vec![1u32, 2, 3]);

thread::scope(|s| {
for i in 0..2 {
s.spawn(|| {
let mut data = data.lock().unwrap();
data[i] += 1;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ I just tried implementing this in a hacky prototype (source). I took the implementation sketch of Scope and added a spawn() method that uses transmutes to trick thread::spawn() into accepting its closure. I also changed the return type as outlined in my previous huge comment to one that wraps JoinHandle. And I discovered a serious problem: this example is not thread-safe, but it's accepted by the API.

Specifically, the i variable is captured by the closures, but the parent thread doesn't realize this and allows the value to be mutated on the stack. With thread::scoped(), attempting to do this same for loop correctly complains that "error: i does not live long enough". The correct example would have used a move closure and an extra line let data = &data; before the closure.

I'm not really sure how to fix this. AIUI, the issue seems to be that the lifetime used in Scope only ensures that borrowck knows that it borrows any lifetimes from outside the scope. But values that are created inside the scope are effectively invisible to borrowck. When the call to spawn() returns, borrowck believes that those values are no longer borrowed, even though the function is in fact executing on another thread. It would seem as though the bound F: FnOnce() -> T + Send + 'a should ensure that borrowck treats the function as potentially living for the entire lifetime of the scope (and therefore that any borrows taken by the function should be valid until the end of the lifetime 'a) but it doesn't seem to actually be doing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. A simpler version seems to detect this properly (http://is.gd/5VuMUV).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard

I'm a bit confused by this comment. My understanding here is that the variable i is inferred to capture by value, because indexing works by value. Since it's a Copy type, that means that a copy is moved into each closure as it's created. So I don't see any data race with the example as given. Can you clarify? I'd also be curious to see the exact use of scoped that produced the borrowck error.

EDIT: I initially said that "addition" works by value but I meant indexing (the use of i in data[i]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that @arielb1's example, unlike the one in the RFC, captures the loop variable by reference due to its use of println rather than indexing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aturon Are we supposed to be capturing Copy variables by-value even in a closure that otherwise captures things by-reference? I'm not aware of such a rule. And it certainly is behaving on my machine in a manner consistent with capturing it by-reference. As for the exact use of scoped, that's present in my source link above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aturon: That example only works (after fixing the attribute to be a crate attribute and adding use std::sync::Mutex;) because you're dropping the guard inside of the for loop. It is in fact borrowing the i value, but the borrow expires inside of the loop. So your code is actually not being parallel at all (which seems like an easy mistake to make with that RAII pattern). Rewriting it to do the loop in parallel:

#![feature(scoped)]
use std::thread;
use std::sync::Mutex;

fn main() {
    let data = Mutex::new(vec![1u32, 2, 3]);

    let mut guards = vec![];
    for i in 0..3 {
        guards.push(thread::scoped(|| {
            let mut data = data.lock().unwrap();
            data[i] += 1;
        }));
    }
    drop(guards);

    println!("{:?}", data);
}

yields the following error:

unnamed.rs:10:36: 13:10 error: `i` does not live long enough
unnamed.rs:10         guards.push(thread::scoped(|| {
unnamed.rs:11             let mut data = data.lock().unwrap();
unnamed.rs:12             data[i] += 1;
unnamed.rs:13         }));
unnamed.rs:6:45: 18:2 note: reference must be valid for the block suffix following statement 0 at 6:44...
unnamed.rs:6     let data = Mutex::new(vec![1u32, 2, 3]);
unnamed.rs:7
unnamed.rs:8     let mut guards = vec![];
unnamed.rs:9     for i in 0..3 {
unnamed.rs:10         guards.push(thread::scoped(|| {
unnamed.rs:11             let mut data = data.lock().unwrap();
              ...
unnamed.rs:9:9: 9:10 note: ...but borrowed value is only valid for the for at 9:8
unnamed.rs:9     for i in 0..3 {
                     ^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard

First, thanks for catching the mistakes in the example -- I was jotting it off quickly on the train :)

I talked to @nikomatsakis about this this morning, and there are two salient points:

  1. You are correct that Copy data is treated specially from the perspective of capture inference. In particular, Copy data is never captured by value, unless you write move. By-value uses of Copy data are simply ignored for capture inference. So indeed, i is taken by reference here, which means that the error produced by thread::scoped is necessary.
  2. There is a bug related to lifetimes not being fully constrained that @nikomatsakis has been looking into already, and he suspects that this bug is causing the API of this RFC to not catch the borrow problem. I've sent him a test case, and he plans to include it in his branch to ensure that the problem is fixed.

Thanks for spotting this, @kballard!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug related to lifetimes not being fully constrained that @nikomatsakis has been looking into already

Ah hah. I thought the current API should have been correct (see the last sentence of my first comment in this thread), so I was quite confused as to why it didn't work. I'm glad to have confirmation that I'm not slowly going crazy 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard What was the outcome of this discussion? There was a serious problem with the API after all? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss The outcome seems to be that there's a bug with handling of lifetimes that @nikomatsakis is looking in to. I hope that this bug is the source of the problem I documented here, although I can't be sure of that at the moment.

}
})
}
```

In the original version of the example, the join guards from the scoped threads
were explicitly collected into a vector (to ensure that the joins did not happen
too early).

With the new version, by contrast, the scope is more clearly marked by an indent,
all *all* joins within the scope automatically happen at the end of the block,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo double all here

making scoped threads feel more like a first-class control-flow construct. In
this RFC author's opinion, this actually winds up *clarifying* the semantics of
scoped threads, and so may be a better API than the original `thread::scoped`.

On a separate note, in practice one will usually not want to spawn full-fledged
threads when doing data-parallel, fork-join style computations; instead you want
to use a thread pool, work stealing, and higher-level combinators. One nice
aspect of `Scope` is that crates providing such parallelism frameworks can
easily hook into the API with their own means of spawning lightweight tasks;
this should also facilitate a [decoupling of thread pools from the lifetimes
bounding the tasks they are running](https://github.com/rust-lang/threadpool/issues/7).

## Implementation

Here's a sketch of the implementation, inspired in part by
[@arielb1's ideas](https://gist.github.com/arielb1/5eb299a87546ce8829b3):

```rust
pub struct Scope<'a> {
dtors: RefCell<Option<DtorChain<'a>>>
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's point out that Scope<'a> must be invariant in 'a (like it is in this sketch, since it is nested inside a Cell).


struct DtorChain<'a> {
dtor: Box<FnBox() + 'a>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FnBox? I assume you meant FnOnce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Box<FnOnce()> isn't callable because by-value self methods on trait objects are not yet callable. FnBox is a trait that has a call_box methods which takes Box<Self>, allowing it to be called. It's hoped that this will be removed one day though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hah, that makes sense. I thought we only had Fn / FnMut / FnOnce, I hadn't heard of FnBox before.

next: Option<Box<DtorChain<'a>>>
}

pub fn scope<'a, F, R>(f: F) -> R where F: FnOnce(&Scope<'a>) -> R {
let mut scope = Scope { dtors: RefCell::new(None) };
let ret = f(&scope);
scope.drop_all();
ret
}

impl<'a> Scope<'a> {
// This method is carefully written in a transactional style, so
// that it can be called directly and, if any dtor panics, can be
// resumed in the unwinding this causes. By initially running the
// method outside of any destructor, we avoid any leakage problems
// due to #14875.
fn drop_all(&mut self) {
loop {
// use a separate scope to ensure that the RefCell borrow
// is relinquished before running `dtor`
let dtor = {
let mut dtors = self.dtors.borrow_mut();
if let Some(mut node) = dtors.take() {
*dtors = node.next.take().map(|b| *b);
node.dtor
} else {
return
}
};
dtor()
}
}

pub fn defer<F>(&self, f: F) where F: FnOnce() + 'a {
let mut dtors = self.dtors.borrow_mut();
*dtors = Some(DtorChain {
dtor: Box::new(f),
next: dtors.take().map(Box::new)
});
}
}

impl<'a> Drop for Scope<'a> {
fn drop(&mut self) {
self.drop_all()
}
}
```

This implementation does a few interesting things:

* It avoids any allocation in the case of a single deferred computation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think my comment to this effect earlier may have been wrong as a Box<FnBox()> is created at least for each node in the chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. It still avoids an allocation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just an impl detail, but I'm curious why not just rustc's None | One(T) | Many(Vec<T>) for this? Once Many is achieved you can potentially reuse the allocation by reusing a scope (where a linked list is likely just going to have all its nodes junked).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gankro Because it's an unboxed closure, so it can't be stored inline in the type, it needs to be stored in a Box.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard Yes, I meant as opposed to the linked list. As you discuss lower down, I'm guessing it has to do with panic guarantees, but I'm not certain.


* It works around issue 14875 in a somewhat subtle way: the `drop_all` method is
called *both* normally within `scope`, and in the `Scope` destructor. The
method is also coded transactionally. This means that the first panic (if any)
in a deferred computation triggers the drop for `Scoped` (without any leakage,
avoiding #14875), and any remaining panics wind up aborting.

Note that this is just a sketch: the use of a linked list here, and `RefCell`
internally, could both be switched for something more interesting.

# Drawbacks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than the ergonomics, the fundamental drawback of this approach is that it is an inversion of the normal RAII flow: functions acquiring some resource return a guard (representing the resource itself) which can be dropped to release the resource. If you acquire an RAII resource internally, you can provide the same API by forwarding the guard.

With APIs like the above, this is reversed - to use an RAII resource, you accept a guard as an argument rather than returning it, and you cannot acquire the resource internally and return it, you must have it passed to you.

This is not necessarily a death blow, just a significant drawback in terms of consistency. For instance, the original thread::scoped API meant that thread::spawn was really just a special case of scoped where 'a is 'static and the default behavior of the guard is different. This API is fundamentally different, and encourages totally different patterns.


The main drawback of this approach is that it is arguably less ergonomic (and
less pretty) than the original `thread::scoped` API. On the flip side, as the
RFC argues, this API makes the control-flow/synchronization of the joins much
clearer, resolving a tension in the design of `thread::scoped` (which `must_use`
only somewhat mitigated).

# Alternatives

The main alternative would be to make the original `thread::scoped` API safe by
changing some other aspect of the language. Ideas along these lines are being
[debated elsewhere](https://github.com/rust-lang/rfcs/pull/1066).

# Unresolved questions

Is there any reason that `Scope` should be `Send` or `Sync`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursion without defining new scopes and unnecessarily tying up system resources by keeping old threads around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any reason for Send (we never get an owned Scope so we can't possibly send it anywhere) but we definitely want Sync, so we can pass the Scope to a fork-join thread in case it needs to conditionally spawn more threads (or otherwise register "finalizers" that work on shared stack data).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that the correct question should be "is there a reason not to make Scope Send and/or Sync." Rust should deliver maximum flexibility for concurrency, so long as it can do so safely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pythonesque Fair point, although if making it Send and/or Sync requires using more expensive operations (e.g. it's internally mutable through a &-reference so any mutations would need to be thread-safe), then it's a valid question to ask.