Some data is not hashed by the SVH that ought to be #32753

nikomatsakis · 2016-04-05T22:02:56Z

New summary:

I think we've all agreed by now that we will focus on the ICH (incremental compilation hash) and revisit the question of a SVH (stable version hash, used to detect ABI-incompatible changes) at some future date. The "original summary" (below) goes into some of the discussion.

I've thus repurposed this issue to target addressing the shortcomings of the existing hash for the purposes of incremental compilation. The following checklist is derived from @michaelwoerister's comment:

incorporate results of name resolution pass, since name resolution is not tracked
incorporate spans, when needed (Incremental compilation: Be smart about hashing spans #33888)
ignore nested items, since the use of items is tracked independently
- note that visit_stmt() and visit_decl() by themselves should not change the hash, since they could just represent a nested item (which we either want to ignore completely or handle in a position independent way).
the CaptureClause of closures should not be ignored
walk_generics() in visit_variant() and visit_variant_data() are redundant; this is already handled by visit_item()
explicitly hashing the label names in ExprLoop, ExprBreak, and ExprAgain is redundant, visit_name() is called for those anyway (as already done for ExprWhile)

There are also optional refinements:

do not hash the actual names of local variables etc, but just map them to unique identifiers or something like that; seems challenging to get something that is both deterministic, however, and also independent from the name itself.

Original summary:

In #32016, I (ab)used the SVH to serve as a hash for the purposes of incremental compilation. This is simply wrong, because the SVH intentionally omits some details that do not affect the ABI, but which very much affect whether code needs to be recompiled (e.g., the value of a constant). This needs to be fixed, but in one of two ways:

Create an alternative incremental compilation hash (ICH), keeping the existing consumers of the SVH.
Convert the SVH (in place) into the ICH, and make the existing consumers use that.

At this point, there are really very few users of the SVH:

Each crate stores the SVH in its metadata, which the compiler then uses to detect transitive mismatches, where you compile crate B with some version of crate A, but you are now compiling C (which uses B) with a distinct version of crate A in scope.
Debuginfo uses it for some reason or other.
There is no third use. (I think)

The second use is unnecessary. The first use is interesting. It's unclear how flexible this check is. But if we used the ICH for this purpose, it would basically be equivalent to the SVH today (the SVH today is looser, as I wrote in the beginning, but effectively any change will cause the SVH to change, with very few exceptions).

Moreover, I claim that even if, someday, we wanted to modify the check to be true ABI compatibility, we probably wouldn't want a single SVH hash. We'd probably want each function hash to include information about the way the argument types are represented, so that we can detect mismatches on a fine-grained level, rather than just "some fn, which you may not even call, changed". But I'm not sure.

One problem with using the ICH everywhere, though, is that we must be careful about endianness. We'd prefer if cross-compiling did not affect the ICH.

I guess I've wasted more time writing this paragraph than it really merits. End of the day, the choice is:

Keep the SVH around, weird as it is.
Kill the SVH and just ICH, which should be stricter.

cc @michaelwoerister

The text was updated successfully, but these errors were encountered:

alexcrichton · 2016-04-05T22:15:05Z

Having some sort of hash match for transitive dependencies seems like it may still be important for the time being? I don't think there's basically any use case where you can recompile an upstream dependency without recompiling intermediate ones and expect it to actually work robustly, so this is largely just preventing us from shooting ourselves in the foot.

In any case, changing the SVH to ICH seems fine by me, I think they'd both satisfy the only remaining purpose I know of for SVH at least.

michaelwoerister · 2016-04-05T22:52:16Z

A proper ICH implementation would probably be a better SVH than the current one, so I'm for re-using the ICH as SVH in the long run. I would keep the SVH around for now and then do the switch when the ICH implementation seems complete enough.

Regarding debuginfo: It should be fine to just use the crate-disambiguator instead of the SVH there. I'll make a PR changing that later this week.

michaelwoerister · 2016-07-15T21:23:46Z

OK, I just took a closer look at the current SVH implementation, gauging its suitability for being used as ICH. My assessment is that we are almost there. Here are the specific points that need to be addressed from my point of view:

The most important shortcoming of the current SVH implementation is that it does not incorporate the results of the name resolution pass. See below for an example.
Spans are not taken into account but need to be (1) when compiling with debug info, and (2) when some kind of span information is used in constants or symbol names (as happens when using file!() and line!() for example). See also Incremental compilation: Be smart about hashing spans #33888.
The order and position of nested items in the AST influences the hash, but it should not. This is something that can easily be fixed, since the SVH and ICH want the same behavior here.
The hash currently is sensitive to the actual names of local variables, lifetime, etc. although they don't matter to the ICH or the SVH (except when compiling with debuginfo). But it might not be worth the trouble to do something about this since these names will be rather stable.
visit_stmt() and visit_decl() by themselves should not change the hash, since they could just represent a nested item (which we either want to ignore completely or handle in a position independent way).
walk_generics() in visit_variant() and visit_variant_data() are redundant; this is already handled by visit_item()
explicitly hashing the label names in ExprLoop, ExprBreak, and ExprAgain is redundant, visit_name() is called for those anyway (as already done for ExprWhile)
the CaptureClause of closures should not be ignored

Regarding name resolution, consider the following case:

mod mod1 {
    struct Foo(u32);
}

mod mod2 {
    struct Foo(i64);
}

mod mod3 {
    use mod1::Foo;

    fn bar() -> usize { 
        mem::size_of::<Foo>()
    }
}

Using the current algorithm, neither the hash of mod3::bar, nor the hash of mod1::Foo will change, if we change the use-statement from use mod1::Foo to use mod2::Foo. Consequently, when using the dependency graph to validate our cache, it will look like we don't need to retranslate mod3::bar.
In order to detect the change, the hash of bar must include information about what the things in it refer to.

One possible solution to this would be: For each NodeIds in the item to be hashed, check for an entry in the DefMap. If there is one, get the DefId it refers to and map it to the corresponding DefPath. Feed this DefPath into the hash.

In the above example, we would then have fed mod2[0]::Foo[0] into the hash instead of mod1[0]::Foo[0] and thus the change would have been detected.

nikomatsakis · 2016-07-26T22:11:28Z

I'm going to take a stab at hacking on this.

nikomatsakis · 2016-07-26T23:35:03Z

Somewhat surprisingly, I'm finding that the hash of items where the only change is a use in scope does change. Trying to get to the bottom of that.

nikomatsakis · 2016-07-26T23:35:41Z

Also, it seems likely that we don't need to include nested items in the hash at all, no?

michaelwoerister · 2016-07-27T06:27:35Z

Also, it seems likely that we don't need to include nested items in the hash at all, no?

I would think so too.

nikomatsakis · 2016-07-27T13:47:30Z

Somewhat surprisingly, I'm finding that the hash of items where the only change is a use in scope does change. Trying to get to the bottom of that.

Never mind, my test was bogus.

nikomatsakis · 2016-07-27T13:53:59Z

I edited the summary to incorporate @michaelwoerister's analysis results as a check-list of changes.

michaelwoerister · 2016-07-29T15:10:07Z

Note: It's probably a good idea to add a visit_span method to intravisit::Visitor when implementing the span-related parts of this.

@michaelwoerister

Various improvements to the SVH This fixes a few points for the SVH: - incorporate resolve results into the SVH; - don't include nested items. r? @michaelwoerister cc #32753 (not fully fixed I don't think)

…atsakis incr. comp.: Take spans into account for ICH This PR makes the ICH (incr. comp. hash) take spans into account when debuginfo is enabled. A side-effect of this is that the SVH (which is based on the ICHs of all items in the crate) becomes sensitive to the tiniest change in a code base if debuginfo is enabled. Since we are not trying to model ABI compatibility via the SVH anymore (this is done via the crate disambiguator now), this should be not be a problem. Fixes #33888. Fixes #32753.

nikomatsakis added the A-incr-comp Area: Incremental compilation label Apr 5, 2016

nikomatsakis added a commit to nikomatsakis/rust that referenced this issue Apr 5, 2016

add FIXME rust-lang#32753 markers: SVH vs ICH

c7100b0

nikomatsakis added a commit to nikomatsakis/rust that referenced this issue Apr 6, 2016

add FIXME rust-lang#32753 markers: SVH vs ICH

50a40e1

nikomatsakis added this to the Incremental compilation alpha milestone Jul 23, 2016

nikomatsakis self-assigned this Jul 26, 2016

nikomatsakis mentioned this issue Jul 27, 2016

Various improvements to the SVH #35079

Merged

nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 10, 2016

nikomatsakis changed the title ~~Resolve the question of SVH vs ICH~~ Some data is not hashed by the SVH that ought to be Aug 10, 2016

nikomatsakis modified the milestone: Incremental compilation alpha Aug 10, 2016

nikomatsakis removed their assignment Aug 10, 2016

michaelwoerister self-assigned this Aug 16, 2016

michaelwoerister mentioned this issue Aug 29, 2016

incr. comp.: Take spans into account for ICH #36025

Merged

bors closed this as completed in #36025 Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some data is not hashed by the SVH that ought to be #32753

Some data is not hashed by the SVH that ought to be #32753

nikomatsakis commented Apr 5, 2016 •

edited by michaelwoerister

Loading

alexcrichton commented Apr 5, 2016

michaelwoerister commented Apr 5, 2016

michaelwoerister commented Jul 15, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 26, 2016

michaelwoerister commented Jul 27, 2016

nikomatsakis commented Jul 27, 2016

nikomatsakis commented Jul 27, 2016

michaelwoerister commented Jul 29, 2016

Some data is not hashed by the SVH that ought to be #32753

Some data is not hashed by the SVH that ought to be #32753

Comments

nikomatsakis commented Apr 5, 2016 • edited by michaelwoerister Loading

alexcrichton commented Apr 5, 2016

michaelwoerister commented Apr 5, 2016

michaelwoerister commented Jul 15, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 26, 2016

michaelwoerister commented Jul 27, 2016

nikomatsakis commented Jul 27, 2016

nikomatsakis commented Jul 27, 2016

michaelwoerister commented Jul 29, 2016

nikomatsakis commented Apr 5, 2016 •

edited by michaelwoerister

Loading