Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross item dependencies, take 2 #30532

Merged
merged 12 commits into from
Jan 6, 2016

Conversation

nikomatsakis
Copy link
Contributor

This is roughly the same as my previous PR that created a dependency graph, but that:

  1. The dependency graph is only optionally constructed, though this doesn't seem to make much of a difference in terms of overhead (see measurements below).
  2. The dependency graph is simpler (I combined a lot of nodes).
  3. The dependency graph debugging facilities are much better: you can now use RUST_DEP_GRAPH_FILTER to filter the dep graph to just the nodes you are interested in, which is super help.
  4. The tests are somewhat more elaborate, including a few known bugs I need to fix in a second pass.

This is potentially a [breaking-change] for plugin authors. If you are poking about in tcx state or something like that, you probably want to add let _ignore = tcx.dep_graph.in_ignore();, which will cause your reads/writes to be ignored and not affect the dep-graph.

After this, or perhaps as an add-on to this PR in some cases, what I would like to do is the following:

  • Write-up a little guide to how to use this system, the debugging options available, and what the possible failure modes are.
  • Introduce read-only and perhaps the Meta node
  • Replace "memoization tasks" with node from the map itself
  • Fix the shortcomings, obviously! Notably, the HIR map needs to register reads, and there is some state that is not yet tracked. (Maybe as a separate PR.)
  • Refactor the dep-graph code so that the actual maintenance of the dep-graph occurs in a parallel thread, and the main thread simply throws things into a shared channel (probably a fixed-size channel). There is no reason for dep-graph construction to be on the main thread. (Maybe as a separate PR.)

Regarding performance: adding this tracking does add some overhead, approximately 2% in my measurements (I was comparing the build times for rustdoc). Interestingly, enabling or disabling tracking doesn't seem to do very much. I want to poke at this some more and gather a bit more data -- in some tests I've seen that 2% go away, but on others it comes back. It's not entirely clear to me if that 2% is truly due to constructing the dep-graph at all.

The next big step after this is write some code to dump the dep-graph to disk and reload it.

r? @michaelwoerister

@michaelwoerister
Copy link
Member

I'll take a close look at this later today.

@nikomatsakis
Copy link
Contributor Author

@michaelwoerister

I never replied to this comment of yours, sorry:

So, one other way to structure this would be to not have separate TypeScheme and TCache nodes at all, because essentially they describe the same thing

I think this is equivalent to the current setup in terms of the final result. It might result in a more compact dependency graph (fewer edges), which seems like a good thing, but I see it as a bit of an optimization that strays from the mostly "pure" form that I've done so far. Still, might be a good idea just because a smaller dep graph is generally a win.

The "pure" form I am describing is that we keep a split between "fn" and "data". That is, we have various task nodes, like CollectItem and TypeScheme, that correspond to bits of code that are executing, and then we have nodes that correspond to data, like Signature (was TCache before). You always push and keep track of your current fn node, and you track which data nodes it accesses and which subprocedures it invokes.

That would avoid adding artificial edges ...

Hmm. Well, first, I don't see that the edge from task to subtask is "artificial": it models return values. That is, pushing a subtask corresponds to calling a fn or something like that, which has a return value. If we were to omit the edges from task to subtask, then there would be cases where that return value allows data to leak in a kind of subtle way, like so:

fn foo(tcx: DepGraph, key: Key) -> Ty<'tcx> {
    let _task = tcx.dep_graph.in_task(Foo);
    tcx.some_map[&key]
}

Note though there is a corresponding potential for a leak that comes through the arguments. That is, when entering a new task, you must "account for" all the data you gained access through implicitly from your scope:

fn foo(tcx: DepGraph, def_id: DefId, ty: Ty<'tcx>) -> Ty<'tcx> {
    // here, `ty` is the type of `def_id`
    let _task = tcx.dep_graph.in_task(Foo);
    use(ty); // this read would otherwise go unnoticed
    tcx.some_map[&def_id]
}

So in a case like this you should add tcx.dep_graph.read(Something(def_id)). But because the tasks are relatively coarse, I haven't found this to come up that often, though I may be missing something. We could require that task bodies are fn() pointers to try and rule this out statically, but it seems unnecessary to me. Time will tell I guess.

So I guess I could see an argument that we should treat arguments/return-values the same way: you must account for any and all data that comes in/out of a task with explicit reads/writes (or perhaps we add some helper called read_task form that adds the read for you, but in_task does not). And so long as tasks communicate by mutating shared maps (or other tracked state), this all happens automatically.

...and lead to a cleaner, more stable dependency graph (in this case at least). More stable because the graph would not depend on (re-)execution order.

I guess here you are suggesting that one task might invoke another in some kind of random order?
I don't think that happens in our code base at the moment: tasks only invoke one another when they are a logical subtask. To have a problem there you'd have to have a setup where a task recursively invokes another subtask, but that subtask is chosen from a queue or some random order. I don't see why one would ever do that: it'd be wasteful of your stack, to start.

Still, I do sort of like the idea of distinguishing read_task from enter_task (today's in_task). IIRC the vast majority of tasks today communicate only via shared state anyway, it's a small number that use return values. And as I wrote above, you always want to be aware of tracked state that is entering your task, maybe it's easier to think of all tracked state that is entering or exiting your task consistently.

@nikomatsakis
Copy link
Contributor Author

@michaelwoerister so I sort of forgot the context of the last comment in writing my reply. I think I stand by everything I said, but I had forgotten we were talking about memoized maps in particular. Thinking about this more, I agree with you that the best approach is to say that the "current task" in such a case is the node for the map-entry itself that we are looking up. I still think it's equivalent to the setup that exists today though.

I was debating about integrating this into the memoized helper routine itself. that is, rather than generating a read (as today) if the value is not present, and otherwise invoking the closure and generating a write, it would instead push a new task corresponding to the key entry (and add a read from the previous task to this new one). We can then ignore the reads/writes on the map itself. So for the example where you have both foo and bar that access baz, if our map entry is Signature(baz), then you'd wind up with the following edges:

Signature(baz) -> foo
Signature(baz) -> bar
Hir(baz) -> Signature(baz)

The interesting tricky case is that final edge. If we go back to the code in question, it would now look something like:

fn signature(ccx, item: &hir::Item) {
    memoized(..., || {
        ccx.tcx.dep_graph.read(Hir(item_def_id)); // XXX
        compute_signature(ccx, item));
    });
}

Note the line marked XXX: this is where the Hir(baz) -> Signature(baz) edge would derive from. This seems awfully subtle: the task creation is now kind of non-obvious, because it is packed into the memoized routine. This seems bad.

Other options:

  • leave the code as it is, but just remove the TypeScheme task and change it to the corresponding data node (with a read edge)
  • the memoized routine could assert that the current task is the one it wants.
  • the memoized routine could use OIBITs to rule out that you pass in any HIR nodes or other tracked data (*)

For now I think I prefer the first choice, which is roughly what you suggested. The last idea sounds cool but seems like it's getting just a bit too clever. Maybe worth experimenting with though, it would really help make the system more bullet-proof.

(*) I'm actually not entirely clear on what that set is. Certainly HIR nodes. But probably Ty<'tcx> and other computed results too.

@nikomatsakis
Copy link
Contributor Author

I've also been thinking about these "meta-nodes" like Trans, Coherence, and Dropck. These basically correspond to big passes that need to read some state and traverse a bunch of items, but never represent a phase that we would skip outright, unless we've concluded that the entire crate is precisely the same (which IS something we should check for; but we should bail way earlier in that case).

It's not clear to me that it is useful to push such tasks. It seems like pure overhead. It would be better to replace them with two distinct tasks, I think: "read-only" and "meta". The rule of thumb would be something like this:

  • push a new task if the following conditions are met:
    • you can narrow down the set of inputs to this task beyond "anything in the entire krate"
    • the task generates output
  • otherwise, push one of the following:
    • read-only if you believe this task does not mutate any shared state
    • the meta task otherwise

Note that using meta would be best avoided, because it basically corresponds to "work that has to be redone if ANYTHING AT ALL changed". So, if you wind up writing to shared state foo and bar from the meta task, you will have a graph like:

krate -> meta
meta -> foo
meta -> bar

Now anything that depends on foo and bar will have to be rewritten if anything in the krate changes.

Note that I'm actually not aware of any places that we need the Meta task. I think read-only suffices. But it does seem like something we might want in general.

@michaelwoerister
Copy link
Member

The "pure" form I am describing is that we keep a split between "fn" and "data".

I think we are just talking about two different but equivalent dependency tracking models here. In mine, there's only data, while yours also has nodes for functions/tasks being executed. There is not much of a difference though as long functions don't have side-effects other than writing the piece of data that they are supposed to produce, just that from the perspective of a "data-only" model, the "fn+data" model introduces an "artificial" intermediate node:

fn+data graph:

input0--+
        |
input1--+---> fn --> output
        |
input2--+

data-only graph:

input0--+
        |
input1--+---> output
        |
input2--+

As long as we are only interested in data, and each piece of data has a unique key, I think it's simple to reason about. Each piece of code that contributes to the value of some piece of data just has to say so by specifying the key of that piece of data before reading any inputs -- i.e., just what enter_task does. Which function or pass creates an edge is only interesting for debugging purposes (the RFC implicitly says as much, if I remember correctly, being stating that fn nodes could be omitted from the graph before persisting).

In the end, it does not matter which model we want to follow, as long as we know how to reason about it and it is simple to use in practice. The data-only model is simpler, I think, but it might be less obvious to use, since it doesn't correspond as closely to the structure of the code itself. On the other hand, a "false correspondence", i.e. something that works almost like something else but doesn't in corner cases, can be its own source of bugs and confusion.

Well, first, I don't see that the edge from task to subtask is "artificial": it models return values.

True, when function executions are modeled within the dependency graph, they are not. As said above, it's modeling function executions in the graph itself that can be viewed as artificial.

I guess here you are suggesting that one task might invoke another in some kind of random order?

I was thinking of two things here:

  • Introducing intra-pass parallelism would lead to local execution order indeterminism
  • Changing the program in semantically insignificant ways, like changing the order of items within a module, would also lead to a different dependency graph for the "same" program.

Both of these things would not lead to incorrect results, but I do see them as kind of a code smell.

Note though there is a corresponding potential for a leak that comes through the arguments. That is, when entering a new task, you must "account for" all the data you gained access through implicitly from your scope.

Maybe we could think about introducing something like a Tracked<T> type (with a better name) that is just a thin wrapper around something that needs to be tracked. Accessing the value would then introduce the edge, and as much of the code as possible would use these instead of the "raw" values (i.e. a Tracked<Ty> instead of a Ty). The upside of this would be that you don't have to worry about dependency tracking when you use such a value. Downsides of this approach would be that (1) it would potentially introduce overhead if the underlying value is accessed more than once (though that could be alleviated through some optimizations) and (2) it might clutter internal APIs if introduced pervasively.

We could require that task bodies are fn() pointers to try and rule this out statically, but it seems unnecessary to me.

It's definitely a neat idea. I like how Rust allows one to express such things :)

Note the line marked XXX: this is where the Hir(baz) -> Signature(baz) edge would derive from. This seems awfully subtle: the task creation is now kind of non-obvious, because it is packed into the memoized routine. This seems bad.

I would probably move the read from the closure into compute_signature, since computing a signature will always add a dependency to the HIR node. It would then not seem subtle to me. Although I certainly also see the advantages of trying to keep dependency tracking separate from pure algorithm implementations.

@nikomatsakis
Copy link
Contributor Author

I think we are just talking about two different but equivalent dependency tracking models here. In mine, there's only data, while yours also has nodes for functions/tasks being executed.

This seems about right.

What is appealing to me about the design I originally put forward is that one can instrument the code in a naive and direct way, just by mapping out the structure of the tasks it is doing, and by identifying the shared state. This accommodates all kind of code, including functions that produce multiple values or which don't produce any value at all (e.g., lint checking).

However, I think what I really want it so adopt a kind of hybrid. That is, where it makes sense, we can conflate the task nodes with the nodes for the value that they are producing. I do this already for memoized tables in some places, and it probably makes sense to do it in the type-checking case. Basically there are places where having distinct nodes for the table and the code that populates it feels sort of artificial, and I think in those cases it makes sense to use a single node.

@nikomatsakis nikomatsakis force-pushed the cross-item-dependencies branch from f9cebb2 to 11c671b Compare January 6, 2016 02:07
@nikomatsakis
Copy link
Contributor Author

@michaelwoerister ok, pushed an updated version. It turned out I was wrong about the overhead -- measurements now show around 1 or 2% building rustdoc, and it appears to be slightly faster if you check for a boolean enabled flag. In any case, this isn't quite ready to land -- there are some TODOs for which I should open separate issues, for example -- but it's ready to review, I think. (Essentially, I'd like to land basically this code and go from there.)

Instead, the idea is to *encapsulate* shared state into some API that
will invoke `read` and `write` automatically. The most common way to
do this is to use a `DepTrackingMap`, described in the next section,
but any sort of abstraction brarier will do. In general, the strategy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brarier => barrier

pub rcache: RefCell<FnvHashMap<ty::CReaderCacheKey, Ty<'tcx>>>,

// Cache for the type-contents routine. FIXME -- track deps?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Initially I thought that it's probably not necessary, since relevant dependencies should already be covered by previous phases. But with generics in play it becomes less clear...

@michaelwoerister
Copy link
Member

Looks very good! Great readme, nice incremental improvements over the previous version too.
r=me when nits are addressed. (see also make tidy errors)

@nikomatsakis
Copy link
Contributor Author

@bors r=mw

@bors
Copy link
Contributor

bors commented Jan 6, 2016

📌 Commit 93996b1 has been approved by mw

@bors
Copy link
Contributor

bors commented Jan 6, 2016

⌛ Testing commit 93996b1 with merge e8c337b...

bors added a commit that referenced this pull request Jan 6, 2016
This is roughly the same as my previous PR that created a dependency graph, but that:

1. The dependency graph is only optionally constructed, though this doesn't seem to make much of a difference in terms of overhead (see measurements below).
2. The dependency graph is simpler (I combined a lot of nodes).
3. The dependency graph debugging facilities are much better: you can now use `RUST_DEP_GRAPH_FILTER` to filter the dep graph to just the nodes you are interested in, which is super help.
4. The tests are somewhat more elaborate, including a few known bugs I need to fix in a second pass.

This is potentially a `[breaking-change]` for plugin authors. If you are poking about in tcx state or something like that, you probably want to add `let _ignore = tcx.dep_graph.in_ignore();`, which will cause your reads/writes to be ignored and not affect the dep-graph.

After this, or perhaps as an add-on to this PR in some cases, what I would like to do is the following:

- [x] Write-up a little guide to how to use this system, the debugging options available, and what the possible failure modes are.
- [ ] Introduce read-only and perhaps the `Meta` node
- [x] Replace "memoization tasks" with node from the map itself
- [ ] Fix the shortcomings, obviously! Notably, the HIR map needs to register reads, and there is some state that is not yet tracked. (Maybe as a separate PR.)
- [x] Refactor the dep-graph code so that the actual maintenance of the dep-graph occurs in a parallel thread, and the main thread simply throws things into a shared channel (probably a fixed-size channel). There is no reason for dep-graph construction to be on the main thread. (Maybe as a separate PR.)

Regarding performance: adding this tracking does add some overhead, approximately 2% in my measurements (I was comparing the build times for rustdoc). Interestingly, enabling or disabling tracking doesn't seem to do very much. I want to poke at this some more and gather a bit more data -- in some tests I've seen that 2% go away, but on others it comes back. It's not entirely clear to me if that 2% is truly due to constructing the dep-graph at all.

The next big step after this is write some code to dump the dep-graph to disk and reload it.

r? @michaelwoerister
@bors bors merged commit 93996b1 into rust-lang:master Jan 6, 2016
@nikomatsakis nikomatsakis deleted the cross-item-dependencies branch March 30, 2016 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants