Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache "global" trait resolutions #26326

Merged
merged 4 commits into from
Jun 18, 2015

Conversation

nikomatsakis
Copy link
Contributor

When we successfully resolve a trait reference with no type/lifetime parameters, like i32: Foo or Box<u32>: Sized, this is in fact globally true. This patch adds a simple global to the tcx to cache such cases. The main advantage of this is really about caching things like Box<Vec<Foo>>: Sized. It also points to the need to revamp our caching infrastructure -- the current caches make selection cost cheaper, but we still wind up paying a high cost in the confirmation process, and in particular unrolling out dependent obligations. Moreover, we should probably do caching more uniformly and with a key that takes the where-clauses into account. But that's for later.

For me, this shows up as a reasonably nice win (20%) on Servo's script crate (when built in dev mode). This is not as big as my initial measurements suggested, I think because I was building my rustc with more debugging enabled at the time. I've not yet done follow-up profiling and so forth to see where the new hot spots are. Bootstrap times seem to be largely unaffected.

cc @pcwalton

This is technically a [breaking-change] in that functions with unsatisfiable where-clauses may now yield errors where before they may have been accepted. Even before, these functions could never have been called by actual code. In the future, such functions will probably become illegal altogether, but in this commit they are still accepted, so long as they do not rely on the unsatisfiable where-clauses. As before, the functions still cannot be called in any case.

that are known to have been satisfied *somewhere*. This means that if
one fn finds that `SomeType: Foo`, then every other fn can just consider
that to hold.

Unfortunately, there are some complications:

1. If `SomeType: Foo` includes dependent conditions, those conditions
   may trigger an error. This error will be repored in the first fn
   where `SomeType: Foo` is evaluated, but not in the other fns, which
   can lead to uneven error reporting (which is sometimes confusing).

2. This kind of caching can be unsound in the presence of
   unsatisfiable where clauses. For example, suppose that the first fn
   has a where-clause like `i32: Bar<u32>`, which in fact does not
   hold. This will "fool" trait resolution into thinking that `i32:
   Bar<u32>` holds. This is ok currently, because it means that the
   first fn can never be calle (since its where clauses cannot be
   satisfied), but if the first fn's successful resolution is cached, it
   can allow other fns to compile that should not. This problem is fixed
   in the next commit.
again, do it once and then just remember the expanded form. At the same
time, filter globally nameable predicates out of the environment, since
they can cause cache errors (and they are not necessary in any case).
different parts of the crate, so modify the test cases that were relying
on that to test distinct types etc.
@rust-highfive
Copy link
Collaborator

r? @pcwalton

(rust_highfive has picked a reviewer for you, use r? to override)

@nikomatsakis
Copy link
Contributor Author

r? @nrc

@rust-highfive rust-highfive assigned nrc and unassigned pcwalton Jun 15, 2015
@metajack
Copy link
Contributor

cc @metajack

// Bar<u32>. But the code doesn't check that this could never be
// satisfied.
require::<i32, u32>();
//~^ ERROR the trait `Bar<u32>` is not implemented for the type `i32`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a "breaking change" by any means?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. This function does compile uselessly in stable/nightly.

@arielb1
Copy link
Contributor

arielb1 commented Jun 15, 2015

trans creates tons of these useless nested obligations - I think the primary benefits may be
there. Could you post some performance numbers?

Empirically, dealing with Box<Foo<Bar>> wasn't that much of an improvement (I tried - typeck prefers the likes of &'foo Box<Bar<'baz>>).

Maybe script is just a different workload - we should be profiling that too.

@arielb1
Copy link
Contributor

arielb1 commented Jun 16, 2015

Anyway, rustc perf numbers:

Before:
548.54user 4.54system 7:16.79elapsed 126%CPU (0avgtext+0avgdata 1174632maxresident)k

After:
537.65user 4.22system 7:04.81elapsed 127%CPU (0avgtext+0avgdata 1175460maxresident)k

This is a 2%-3% improvement on rustc. It is indeed mostly a trans improvement (improves translation from 89.6s to 79.3s, but type-checking from 62.6s to 59s) - probably because of workload differences between rustc and trans. Still nice.

I guess the mostly-uniformity of profiling info across rustc isn't worth much - I really need to profile other crates too.

@@ -3018,11 +3066,12 @@ impl FlagComputation {
}

&TyClosure(_, substs) => {
self.add_flags(TypeFlags::HAS_TY_CLOSURE);
self.add_flags(TypeFlags::HAS_LOCAL_NAMES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be a mistake - at least the removal.

@nrc
Copy link
Member

nrc commented Jun 16, 2015

r=me with the rebasing errors fixed

@nikomatsakis
Copy link
Contributor Author

@arielb1 time numbers for compiling script (in dev mode) on my machine:

After:

  • real 2m2.024s
  • user 1m58.926s
  • sys 0m3.098s

Before:

  • real 2m35.145s
  • user 2m32.113s
  • sys 0m2.980s

Regarding error messages, I discussed it with @pcwalton, and we felt like it was worth landing this sooner and dealing with error messages later -- though I left a note for what I think is the best fix. Doesn't seem too hard, I could do some local experiments.

@nikomatsakis
Copy link
Contributor Author

@arielb1 based on the profiling I did, the benefits were not just in trans, but they were primarily due to Sized and Copy bounds. However, the numbers I see now look quite different than what I was seeing then, so I ought to re-run the profiles (I think this is due to how I compiled rustc, but I'm not sure).

@arielb1
Copy link
Contributor

arielb1 commented Jun 16, 2015

@nikomatsakis

I feel like a better way of doing this is to do each global predicate in a fresh "global" fulfillcx, and merge duplicates if fulfillment is successful (I don't think it is worth to keep track of errors). I don't like the idea of polluting a global cache with potentially-bogus results - it seems to me like it would easily cause mysterious ICE-s or other error cascades.

@nikomatsakis
Copy link
Contributor Author

@arielb1 well, it doesn't really matter so much what fulfill cx we do it in, the problem is more that we have to track when all dependent resolutions are complete. I do agree there is the potential for ICEs here.

@arielb1
Copy link
Contributor

arielb1 commented Jun 16, 2015

This does indeed seem to produce 13-14% improvements in both typeck and trans time when assertions are enabled (I guess these debug!-s add up).

@nikomatsakis
If the predicate is global then we can just do select_all_or_error (and panic when it is violated - must be careful not to panic or cache before coherence runs through).

@pcwalton
Copy link
Contributor

@bors: r+

@bors
Copy link
Contributor

bors commented Jun 16, 2015

📌 Commit 957935a has been approved by pcwalton

@bors
Copy link
Contributor

bors commented Jun 17, 2015

⌛ Testing commit 957935a with merge 713d917...

bors added a commit that referenced this pull request Jun 17, 2015
…, r=pcwalton

When we successfully resolve a trait reference with no type/lifetime parameters, like `i32: Foo` or `Box<u32>: Sized`, this is in fact globally true. This patch adds a simple global to the tcx to cache such cases. The main advantage of this is really about caching things like `Box<Vec<Foo>>: Sized`. It also points to the need to revamp our caching infrastructure -- the current caches make selection cost cheaper, but we still wind up paying a high cost in the confirmation process, and in particular unrolling out dependent obligations. Moreover, we should probably do caching more uniformly and with a key that takes the where-clauses into account. But that's for later.

For me, this shows up as a reasonably nice win (20%) on Servo's script crate (when built in dev mode). This is not as big as my initial measurements suggested, I think because I was building my rustc with more debugging enabled at the time. I've not yet done follow-up profiling and so forth to see where the new hot spots are. Bootstrap times seem to be largely unaffected.

cc @pcwalton 

This is technically a [breaking-change] in that functions with unsatisfiable where-clauses may now yield errors where before they may have been accepted. Even before, these functions could never have been *called* by actual code. In the future, such functions will probably become illegal altogether, but in this commit they are still accepted, so long as they do not rely on the unsatisfiable where-clauses. As before, the functions still cannot be called in any case.
@bors bors merged commit 957935a into rust-lang:master Jun 18, 2015
nrc added a commit to nrc/rust that referenced this pull request Jun 23, 2015
@nikomatsakis nikomatsakis deleted the optimize-fulfillment-cache-in-tcx branch March 30, 2016 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants