Cache "global" trait resolutions #26326

nikomatsakis · 2015-06-15T22:11:38Z

When we successfully resolve a trait reference with no type/lifetime parameters, like i32: Foo or Box<u32>: Sized, this is in fact globally true. This patch adds a simple global to the tcx to cache such cases. The main advantage of this is really about caching things like Box<Vec<Foo>>: Sized. It also points to the need to revamp our caching infrastructure -- the current caches make selection cost cheaper, but we still wind up paying a high cost in the confirmation process, and in particular unrolling out dependent obligations. Moreover, we should probably do caching more uniformly and with a key that takes the where-clauses into account. But that's for later.

For me, this shows up as a reasonably nice win (20%) on Servo's script crate (when built in dev mode). This is not as big as my initial measurements suggested, I think because I was building my rustc with more debugging enabled at the time. I've not yet done follow-up profiling and so forth to see where the new hot spots are. Bootstrap times seem to be largely unaffected.

cc @pcwalton

This is technically a [breaking-change] in that functions with unsatisfiable where-clauses may now yield errors where before they may have been accepted. Even before, these functions could never have been called by actual code. In the future, such functions will probably become illegal altogether, but in this commit they are still accepted, so long as they do not rely on the unsatisfiable where-clauses. As before, the functions still cannot be called in any case.

that are known to have been satisfied *somewhere*. This means that if one fn finds that `SomeType: Foo`, then every other fn can just consider that to hold. Unfortunately, there are some complications: 1. If `SomeType: Foo` includes dependent conditions, those conditions may trigger an error. This error will be repored in the first fn where `SomeType: Foo` is evaluated, but not in the other fns, which can lead to uneven error reporting (which is sometimes confusing). 2. This kind of caching can be unsound in the presence of unsatisfiable where clauses. For example, suppose that the first fn has a where-clause like `i32: Bar<u32>`, which in fact does not hold. This will "fool" trait resolution into thinking that `i32: Bar<u32>` holds. This is ok currently, because it means that the first fn can never be calle (since its where clauses cannot be satisfied), but if the first fn's successful resolution is cached, it can allow other fns to compile that should not. This problem is fixed in the next commit.

again, do it once and then just remember the expanded form. At the same time, filter globally nameable predicates out of the environment, since they can cause cache errors (and they are not necessary in any case).

different parts of the crate, so modify the test cases that were relying on that to test distinct types etc.

rust-highfive · 2015-06-15T22:11:43Z

r? @pcwalton

(rust_highfive has picked a reviewer for you, use r? to override)

nikomatsakis · 2015-06-15T22:12:01Z

r? @nrc

metajack · 2015-06-15T22:13:53Z

cc @metajack

eddyb · 2015-06-15T22:49:34Z

src/test/compile-fail/cross-fn-cache-hole.rs

+    // Bar<u32>. But the code doesn't check that this could never be
+    // satisfied.
+    require::<i32, u32>();
+    //~^ ERROR the trait `Bar<u32>` is not implemented for the type `i32`


Is this a "breaking change" by any means?

Kind of. This function does compile uselessly in stable/nightly.

arielb1 · 2015-06-15T23:17:57Z

trans creates tons of these useless nested obligations - I think the primary benefits may be
there. Could you post some performance numbers?

Empirically, dealing with Box<Foo<Bar>> wasn't that much of an improvement (I tried - typeck prefers the likes of &'foo Box<Bar<'baz>>).

Maybe script is just a different workload - we should be profiling that too.

arielb1 · 2015-06-16T00:18:55Z

Anyway, rustc perf numbers:

Before:
548.54user 4.54system 7:16.79elapsed 126%CPU (0avgtext+0avgdata 1174632maxresident)k

After:
537.65user 4.22system 7:04.81elapsed 127%CPU (0avgtext+0avgdata 1175460maxresident)k

This is a 2%-3% improvement on rustc. It is indeed mostly a trans improvement (improves translation from 89.6s to 79.3s, but type-checking from 62.6s to 59s) - probably because of workload differences between rustc and trans. Still nice.

I guess the mostly-uniformity of profiling info across rustc isn't worth much - I really need to profile other crates too.

nrc · 2015-06-16T00:43:41Z

src/librustc/middle/ty.rs

@@ -3018,11 +3066,12 @@ impl FlagComputation {
            }

            &TyClosure(_, substs) => {
-                self.add_flags(TypeFlags::HAS_TY_CLOSURE);
+                self.add_flags(TypeFlags::HAS_LOCAL_NAMES);


This looks to be a mistake - at least the removal.

nrc · 2015-06-16T00:45:42Z

r=me with the rebasing errors fixed

nikomatsakis · 2015-06-16T15:55:06Z

@arielb1 time numbers for compiling script (in dev mode) on my machine:

After:

real 2m2.024s
user 1m58.926s
sys 0m3.098s

Before:

real 2m35.145s
user 2m32.113s
sys 0m2.980s

Regarding error messages, I discussed it with @pcwalton, and we felt like it was worth landing this sooner and dealing with error messages later -- though I left a note for what I think is the best fix. Doesn't seem too hard, I could do some local experiments.

nikomatsakis · 2015-06-16T16:07:20Z

@arielb1 based on the profiling I did, the benefits were not just in trans, but they were primarily due to Sized and Copy bounds. However, the numbers I see now look quite different than what I was seeing then, so I ought to re-run the profiles (I think this is due to how I compiled rustc, but I'm not sure).

arielb1 · 2015-06-16T18:38:12Z

@nikomatsakis

I feel like a better way of doing this is to do each global predicate in a fresh "global" fulfillcx, and merge duplicates if fulfillment is successful (I don't think it is worth to keep track of errors). I don't like the idea of polluting a global cache with potentially-bogus results - it seems to me like it would easily cause mysterious ICE-s or other error cascades.

nikomatsakis · 2015-06-16T19:05:46Z

@arielb1 well, it doesn't really matter so much what fulfill cx we do it in, the problem is more that we have to track when all dependent resolutions are complete. I do agree there is the potential for ICEs here.

arielb1 · 2015-06-16T20:44:02Z

This does indeed seem to produce 13-14% improvements in both typeck and trans time when assertions are enabled (I guess these debug!-s add up).

@nikomatsakis
If the predicate is global then we can just do select_all_or_error (and panic when it is violated - must be careful not to panic or cache before coherence runs through).

pcwalton · 2015-06-16T21:30:26Z

@bors: r+

bors · 2015-06-16T21:30:28Z

📌 Commit 957935a has been approved by pcwalton

bors · 2015-06-17T22:50:18Z

⌛ Testing commit 957935a with merge 713d917...

@pcwalton

…, r=pcwalton When we successfully resolve a trait reference with no type/lifetime parameters, like `i32: Foo` or `Box<u32>: Sized`, this is in fact globally true. This patch adds a simple global to the tcx to cache such cases. The main advantage of this is really about caching things like `Box<Vec<Foo>>: Sized`. It also points to the need to revamp our caching infrastructure -- the current caches make selection cost cheaper, but we still wind up paying a high cost in the confirmation process, and in particular unrolling out dependent obligations. Moreover, we should probably do caching more uniformly and with a key that takes the where-clauses into account. But that's for later. For me, this shows up as a reasonably nice win (20%) on Servo's script crate (when built in dev mode). This is not as big as my initial measurements suggested, I think because I was building my rustc with more debugging enabled at the time. I've not yet done follow-up profiling and so forth to see where the new hot spots are. Bootstrap times seem to be largely unaffected. cc @pcwalton This is technically a [breaking-change] in that functions with unsatisfiable where-clauses may now yield errors where before they may have been accepted. Even before, these functions could never have been *called* by actual code. In the future, such functions will probably become illegal altogether, but in this commit they are still accepted, so long as they do not rely on the unsatisfiable where-clauses. As before, the functions still cannot be called in any case.

bors · 2015-06-18T00:26:21Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt

Regressed by rust-lang#26326

nikomatsakis added 3 commits June 15, 2015 17:31

Rather than expanding the where-clauses in the environment over and over

f0f13f8

again, do it once and then just remember the expanded form. At the same time, filter globally nameable predicates out of the environment, since they can cause cache errors (and they are not necessary in any case).

This new caching sometimes suppresses duplicate errors that occur in

13cee1e

different parts of the crate, so modify the test cases that were relying on that to test distinct types etc.

rust-highfive assigned pcwalton Jun 15, 2015

rust-highfive assigned nrc and unassigned pcwalton Jun 15, 2015

eddyb reviewed Jun 15, 2015
View reviewed changes

nrc reviewed Jun 16, 2015
View reviewed changes

Fix merge errors

957935a

bors merged commit 957935a into rust-lang:master Jun 18, 2015

bors mentioned this pull request Jun 18, 2015

Prevent coercions from polluting the fulfillment context #26324

Closed

bors mentioned this pull request Jun 18, 2015

Use TLS to store a type context for pretty-printing types. #26351

Merged

nrc added a commit to nrc/rust that referenced this pull request Jun 23, 2015

Fix parallel codegen regression

ce0c257

Regressed by rust-lang#26326

nikomatsakis deleted the optimize-fulfillment-cache-in-tcx branch March 30, 2016 16:16

Cache "global" trait resolutions #26326

Cache "global" trait resolutions #26326

Uh oh!

Conversation

nikomatsakis commented Jun 15, 2015

Uh oh!

rust-highfive commented Jun 15, 2015

Uh oh!

nikomatsakis commented Jun 15, 2015

Uh oh!

metajack commented Jun 15, 2015

Uh oh!

eddyb Jun 15, 2015

Choose a reason for hiding this comment

Uh oh!

arielb1 Jun 15, 2015

Choose a reason for hiding this comment

Uh oh!

arielb1 commented Jun 15, 2015

Uh oh!

arielb1 commented Jun 16, 2015

Uh oh!

nrc Jun 16, 2015

Choose a reason for hiding this comment

Uh oh!

nrc commented Jun 16, 2015

Uh oh!

nikomatsakis commented Jun 16, 2015

Uh oh!

nikomatsakis commented Jun 16, 2015

Uh oh!

arielb1 commented Jun 16, 2015

Uh oh!

nikomatsakis commented Jun 16, 2015

Uh oh!

arielb1 commented Jun 16, 2015

Uh oh!

pcwalton commented Jun 16, 2015

Uh oh!

bors commented Jun 16, 2015

Uh oh!

bors commented Jun 17, 2015

Uh oh!

bors commented Jun 18, 2015

Uh oh!

Uh oh!