-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CGU size heuristic for partitioning #47415
Conversation
|
||
// Since `sort_by_key` currently recomputes the keys for each comparison, | ||
// we can save unnecessary recomputations by storing size estimates for | ||
// each `MonoItem`. Storing estimates for `CodegenUnit` might be preferable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it's worth trying to store estimates for CodegenUnit
instead. I found it somewhat troublesome to use CodegenUnit
as a key, but perhaps it'd be worth introducing an index for the units to use instead.
I was also looking at this issue. |
Good spot! You're right — I'll include that in the changes. (The fact that each |
Thank you both for looking into this, @varkor and @JustAPerson! Here is what I would do:
|
Okay, I was wondering whether I could get away with extending |
f055bd0
to
f085542
Compare
@varkor perhaps update the comments at those two sites, because now they're wrong. |
@JustAPerson: I saw the comments, thought to myself: "I must remember to update those before committing", and then completely forgot! 😅 Thank you! |
OK, that looks like a correct implementation to me. Now that I see it though, I think it is doing more work than necessary:
I think (2) is more of a problem than (1). It could be solved by adding an Such a query could also be optimized by just assigning a fixed small size for compiler generated shims like Would one of you be willing to implement that? |
@JustAPerson: If you'd like to finish the change off to share the work, you're very welcome to! I guess the easiest way would be to make a PR on my branch, which would then appear here after being merged in? I'm happy to make the change if you don't want to though! |
@varkor I appreciate the offer but I think I'm a bit busy today so go ahead :) |
|
Hm, OK, interesting approach; and certainly efficient! But it doesn't feel quite right to me. The first reason is that it's hard to uphold the condition you state: The code will copy The second reason is that it kind of conflates separate concerns by putting something into I suggest to go the "query" route. It might be slightly less efficient but it will be more robust in the end. If you need any help with setting up a new query, let me know! |
@michaelwoerister: Ah, I misunderstood — I didn't realise that queries were provided with |
Oh sorry, yeah, it's easy to slip into rustc-internals-lingo without noticing. Yes, "query" is what we call the things in |
a20b94d
to
98752ea
Compare
This addresses the concern of rust-lang#47316 by estimating CGU size based on the size of its MIR. Looking at the size estimate differences for a small selection of crates, this heuristic produces different orderings, which should more accurately reflect optimisation time. Fixes rust-lang#47316.
98752ea
to
5c9a8b5
Compare
I've added an |
@@ -638,6 +638,7 @@ define_dep_nodes!( <'tcx> | |||
[input] TargetFeaturesWhitelist, | |||
[] TargetFeaturesEnabled(DefId), | |||
|
|||
[] InstanceDefSizeEstimate { instance_def: InstanceDef<'tcx> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is []
correct here? I wasn't sure what this meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. Leaving the brackets empty means that this is a regular query/dep-node.
@@ -761,6 +761,7 @@ pub fn force_from_dep_node<'a, 'gcx, 'lcx>(tcx: TyCtxt<'a, 'gcx, 'lcx>, | |||
DepKind::EraseRegionsTy | | |||
DepKind::NormalizeTy | | |||
DepKind::SubstituteNormalizeAndTestPredicates | | |||
DepKind::InstanceDefSizeEstimate | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct location? Or should it force!
something like some of the queries below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's the correct location. Very good! :)
force_from_dep_node
takes a DepNode
and tries to execute the query corresponding to it. However, a DepNode
only consists of a DepKind
and an opaque hash value (which acts as a UUID). It is only in special cases, like when the hash value is actually a DefPathHash
that we can look up in a table, that we can map from DepNode
to the correct query and its arguments. For ty::Instance
we don't have such a table that would make it possible to do the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! This looks almost done. I left some comments below. I'm impressed by how well you managed to navigate our woefully under-documented query system :)
@@ -638,6 +638,7 @@ define_dep_nodes!( <'tcx> | |||
[input] TargetFeaturesWhitelist, | |||
[] TargetFeaturesEnabled(DefId), | |||
|
|||
[] InstanceDefSizeEstimate { instance_def: InstanceDef<'tcx> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. Leaving the brackets empty means that this is a regular query/dep-node.
src/librustc/mir/mono.rs
Outdated
pub fn size_estimate(&self) -> usize { | ||
// Should only be called if `estimate_size` has previously been called. | ||
assert!(self.size_estimate.is_some()); | ||
self.size_estimate.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace this combination of assert!()
and unwrap()
with a call to Option::expect()
. That amounts to the same thing but is a bit more concise.
src/librustc/mir/mono.rs
Outdated
@@ -140,6 +176,7 @@ impl<'tcx> HashStable<StableHashingContext<'tcx>> for CodegenUnit<'tcx> { | |||
let CodegenUnit { | |||
ref items, | |||
name, | |||
.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep these destructuring let
s exhaustive in order not to miss handling fields that are added later. In a case like this you can explicitly ignore the field:
let CodegenUnit {
ref items,
name,
// The size estimate is not relevant to the hash
size_estimate: _,
} = *self;
Making these exhaustive has proven to be effective for preventing errors introduced by unrelated refactorings.
@@ -761,6 +761,7 @@ pub fn force_from_dep_node<'a, 'gcx, 'lcx>(tcx: TyCtxt<'a, 'gcx, 'lcx>, | |||
DepKind::EraseRegionsTy | | |||
DepKind::NormalizeTy | | |||
DepKind::SubstituteNormalizeAndTestPredicates | | |||
DepKind::InstanceDefSizeEstimate | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's the correct location. Very good! :)
force_from_dep_node
takes a DepNode
and tries to execute the query corresponding to it. However, a DepNode
only consists of a DepKind
and an opaque hash value (which acts as a UUID). It is only in special cases, like when the hash value is actually a DefPathHash
that we can look up in a table, that we can map from DepNode
to the correct query and its arguments. For ty::Instance
we don't have such a table that would make it possible to do the mapping.
src/librustc/ty/mod.rs
Outdated
InstanceDef::Item(def_id) => { | ||
let mir = tcx.optimized_mir(def_id); | ||
mir.basic_blocks().iter().map(|bb| bb.statements.len()).sum() | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should estimate the size of DropGlue
via its MIR.
Also a little bit of clean up.
⌛ Testing commit 62703cf with merge 94f7e25e204d8c8fdde9f55ae8a9e08512f7d63d... |
@bors: retry
|
⌛ Testing commit 62703cf with merge 061b57274985c173d76ab948510d43168ddc0a12... |
@bors: retry
|
…ster Add CGU size heuristic for partitioning This addresses the concern of #47316 by estimating CGU size based on the size of its MIR. Looking at the size estimate differences for a small selection of crates, this heuristic produces different orderings, which should more accurately reflect optimisation time. (Fixes #47316.) r? @michaelwoerister
💔 Test failed - status-appveyor |
⌛ Testing commit 62703cf with merge 8952c0af0dcc0effb95544b730b6cd1e396c1feb... |
@bors: retry
|
⌛ Testing commit 62703cf with merge 1e968bad4676ef2cabe13b1c9eca697997151ada... |
@bors: retry
|
Add slice::sort_by_cached_key as a memoised sort_by_key At present, `slice::sort_by_key` calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when `sort_by_key` is chosen over `sort_by`), this can lead to very suboptimal behaviour. To address this, I've introduced a new slice method, `sort_by_cached_key`, which has identical semantic behaviour to `sort_by_key`, except that it guarantees the key function will only be called once per element. Where there are `n` elements and the key function is `O(m)`: - `slice::sort_by_cached_key` time complexity is `O(m n log m n)`, compared to `slice::sort_by_key`'s `O(m n + n log n)`. - `slice::sort_by_cached_key` space complexity remains at `O(n + m)`. (Technically, it now reserves a slice of size `n`, whereas before it reserved a slice of size `n/2`.) `slice::sort_unstable_by_key` has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that `slice::sort_unstable_by_key` is likely to be slower than `slice::sort_by_cached_key` when the key function does not have negligible complexity. We might want to explore this trade-off further in the future. Benchmarks (for a vector of 100 `i32`s): ``` # Lexicographic: `|x| x.to_string()` test bench_sort_by_key ... bench: 112,638 ns/iter (+/- 19,563) test bench_sort_by_cached_key ... bench: 15,038 ns/iter (+/- 4,814) # Identity: `|x| *x` test bench_sort_by_key ... bench: 1,346 ns/iter (+/- 238) test bench_sort_by_cached_key ... bench: 1,839 ns/iter (+/- 765) # Power: `|x| x.pow(31)` test bench_sort_by_key ... bench: 3,624 ns/iter (+/- 738) test bench_sort_by_cached_key ... bench: 1,997 ns/iter (+/- 311) # Abs: `|x| x.abs()` test bench_sort_by_key ... bench: 1,546 ns/iter (+/- 174) test bench_sort_by_cached_key ... bench: 1,668 ns/iter (+/- 790) ``` (So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.) I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for rust-lang#47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys. Resolves rust-lang#34447.
This addresses the concern of #47316 by estimating CGU size based on
the size of its MIR. Looking at the size estimate differences for a
small selection of crates, this heuristic produces different orderings,
which should more accurately reflect optimisation time. (Fixes #47316.)
r? @michaelwoerister