Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some data is not hashed by the SVH that ought to be #32753

Closed
2 of 7 tasks
nikomatsakis opened this issue Apr 5, 2016 · 10 comments
Closed
2 of 7 tasks

Some data is not hashed by the SVH that ought to be #32753

nikomatsakis opened this issue Apr 5, 2016 · 10 comments
Assignees
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 5, 2016

New summary:

I think we've all agreed by now that we will focus on the ICH (incremental compilation hash) and revisit the question of a SVH (stable version hash, used to detect ABI-incompatible changes) at some future date. The "original summary" (below) goes into some of the discussion.

I've thus repurposed this issue to target addressing the shortcomings of the existing hash for the purposes of incremental compilation. The following checklist is derived from @michaelwoerister's comment:

  • incorporate results of name resolution pass, since name resolution is not tracked
  • incorporate spans, when needed (Incremental compilation: Be smart about hashing spans #33888)
  • ignore nested items, since the use of items is tracked independently
    • note that visit_stmt() and visit_decl() by themselves should not change the hash, since they could just represent a nested item (which we either want to ignore completely or handle in a position independent way).
  • the CaptureClause of closures should not be ignored
  • walk_generics() in visit_variant() and visit_variant_data() are redundant; this is already handled by visit_item()
  • explicitly hashing the label names in ExprLoop, ExprBreak, and ExprAgain is redundant, visit_name() is called for those anyway (as already done for ExprWhile)

There are also optional refinements:

  • do not hash the actual names of local variables etc, but just map them to unique identifiers or something like that; seems challenging to get something that is both deterministic, however, and also independent from the name itself.

Original summary:

In #32016, I (ab)used the SVH to serve as a hash for the purposes of incremental compilation. This is simply wrong, because the SVH intentionally omits some details that do not affect the ABI, but which very much affect whether code needs to be recompiled (e.g., the value of a constant). This needs to be fixed, but in one of two ways:

  • Create an alternative incremental compilation hash (ICH), keeping the existing consumers of the SVH.
  • Convert the SVH (in place) into the ICH, and make the existing consumers use that.

At this point, there are really very few users of the SVH:

  • Each crate stores the SVH in its metadata, which the compiler then uses to detect transitive mismatches, where you compile crate B with some version of crate A, but you are now compiling C (which uses B) with a distinct version of crate A in scope.
  • Debuginfo uses it for some reason or other.
  • There is no third use. (I think)

The second use is unnecessary. The first use is interesting. It's unclear how flexible this check is. But if we used the ICH for this purpose, it would basically be equivalent to the SVH today (the SVH today is looser, as I wrote in the beginning, but effectively any change will cause the SVH to change, with very few exceptions).

Moreover, I claim that even if, someday, we wanted to modify the check to be true ABI compatibility, we probably wouldn't want a single SVH hash. We'd probably want each function hash to include information about the way the argument types are represented, so that we can detect mismatches on a fine-grained level, rather than just "some fn, which you may not even call, changed". But I'm not sure.

One problem with using the ICH everywhere, though, is that we must be careful about endianness. We'd prefer if cross-compiling did not affect the ICH.

I guess I've wasted more time writing this paragraph than it really merits. End of the day, the choice is:

  • Keep the SVH around, weird as it is.
  • Kill the SVH and just ICH, which should be stricter.

cc @michaelwoerister

@nikomatsakis nikomatsakis added the A-incr-comp Area: Incremental compilation label Apr 5, 2016
nikomatsakis added a commit to nikomatsakis/rust that referenced this issue Apr 5, 2016
@alexcrichton
Copy link
Member

Having some sort of hash match for transitive dependencies seems like it may still be important for the time being? I don't think there's basically any use case where you can recompile an upstream dependency without recompiling intermediate ones and expect it to actually work robustly, so this is largely just preventing us from shooting ourselves in the foot.

In any case, changing the SVH to ICH seems fine by me, I think they'd both satisfy the only remaining purpose I know of for SVH at least.

@michaelwoerister
Copy link
Member

A proper ICH implementation would probably be a better SVH than the current one, so I'm for re-using the ICH as SVH in the long run. I would keep the SVH around for now and then do the switch when the ICH implementation seems complete enough.

Regarding debuginfo: It should be fine to just use the crate-disambiguator instead of the SVH there. I'll make a PR changing that later this week.

nikomatsakis added a commit to nikomatsakis/rust that referenced this issue Apr 6, 2016
@michaelwoerister
Copy link
Member

OK, I just took a closer look at the current SVH implementation, gauging its suitability for being used as ICH. My assessment is that we are almost there. Here are the specific points that need to be addressed from my point of view:

  • The most important shortcoming of the current SVH implementation is that it does not incorporate the results of the name resolution pass. See below for an example.
  • Spans are not taken into account but need to be (1) when compiling with debug info, and (2) when some kind of span information is used in constants or symbol names (as happens when using file!() and line!() for example). See also Incremental compilation: Be smart about hashing spans #33888.
  • The order and position of nested items in the AST influences the hash, but it should not. This is something that can easily be fixed, since the SVH and ICH want the same behavior here.
  • The hash currently is sensitive to the actual names of local variables, lifetime, etc. although they don't matter to the ICH or the SVH (except when compiling with debuginfo). But it might not be worth the trouble to do something about this since these names will be rather stable.
  • visit_stmt() and visit_decl() by themselves should not change the hash, since they could just represent a nested item (which we either want to ignore completely or handle in a position independent way).
  • walk_generics() in visit_variant() and visit_variant_data() are redundant; this is already handled by visit_item()
  • explicitly hashing the label names in ExprLoop, ExprBreak, and ExprAgain is redundant, visit_name() is called for those anyway (as already done for ExprWhile)
  • the CaptureClause of closures should not be ignored

Regarding name resolution, consider the following case:

mod mod1 {
    struct Foo(u32);
}

mod mod2 {
    struct Foo(i64);
}

mod mod3 {
    use mod1::Foo;

    fn bar() -> usize { 
        mem::size_of::<Foo>()
    }
}

Using the current algorithm, neither the hash of mod3::bar, nor the hash of mod1::Foo will change, if we change the use-statement from use mod1::Foo to use mod2::Foo. Consequently, when using the dependency graph to validate our cache, it will look like we don't need to retranslate mod3::bar.
In order to detect the change, the hash of bar must include information about what the things in it refer to.

One possible solution to this would be: For each NodeIds in the item to be hashed, check for an entry in the DefMap. If there is one, get the DefId it refers to and map it to the corresponding DefPath. Feed this DefPath into the hash.

In the above example, we would then have fed mod2[0]::Foo[0] into the hash instead of mod1[0]::Foo[0] and thus the change would have been detected.

@nikomatsakis nikomatsakis added this to the Incremental compilation alpha milestone Jul 23, 2016
@nikomatsakis
Copy link
Contributor Author

I'm going to take a stab at hacking on this.

@nikomatsakis nikomatsakis self-assigned this Jul 26, 2016
@nikomatsakis
Copy link
Contributor Author

Somewhat surprisingly, I'm finding that the hash of items where the only change is a use in scope does change. Trying to get to the bottom of that.

@nikomatsakis
Copy link
Contributor Author

Also, it seems likely that we don't need to include nested items in the hash at all, no?

@michaelwoerister
Copy link
Member

Also, it seems likely that we don't need to include nested items in the hash at all, no?

I would think so too.

@nikomatsakis
Copy link
Contributor Author

Somewhat surprisingly, I'm finding that the hash of items where the only change is a use in scope does change. Trying to get to the bottom of that.

Never mind, my test was bogus.

@nikomatsakis
Copy link
Contributor Author

I edited the summary to incorporate @michaelwoerister's analysis results as a check-list of changes.

@michaelwoerister
Copy link
Member

Note: It's probably a good idea to add a visit_span method to intravisit::Visitor when implementing the span-related parts of this.

bors added a commit that referenced this issue Aug 10, 2016
Various improvements to the SVH

This fixes a few points for the SVH:

- incorporate resolve results into the SVH;
- don't include nested items.

r? @michaelwoerister

cc #32753 (not fully fixed I don't think)
@nikomatsakis nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 10, 2016
@nikomatsakis nikomatsakis changed the title Resolve the question of SVH vs ICH Some data is not hashed by the SVH that ought to be Aug 10, 2016
@nikomatsakis nikomatsakis modified the milestone: Incremental compilation alpha Aug 10, 2016
@nikomatsakis nikomatsakis removed their assignment Aug 10, 2016
@michaelwoerister michaelwoerister self-assigned this Aug 16, 2016
bors added a commit that referenced this issue Sep 6, 2016
…atsakis

incr. comp.: Take spans into account for ICH

This PR makes the ICH (incr. comp. hash) take spans into account when debuginfo is enabled.

A side-effect of this is that the SVH (which is based on the ICHs of all items in the crate) becomes sensitive to the tiniest change in a code base if debuginfo is enabled. Since we are not trying to model ABI compatibility via the SVH anymore (this is done via the crate disambiguator now), this should be not be a problem.

Fixes #33888.
Fixes #32753.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants