-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] mtime+content tracking #8623
Conversation
r? @Eh2406 (rust_highfive has picked a reviewer for you, use r? to override) |
Is is a far more ambitious fix than what I had in mind for #6529, I can see advantages to this approach but want to make sure I understand the tradeoffs. The current mtime based handling happens entirely in Cargo. This PR handles hashes using both Cargo and Rustc. What is the advantage of involving Rustc? Given the advantages of the new way should we move the handling of mtime to it? |
It turns out that rustc hashes all the source files (to put that hash in the debug info) anyway, so if we let rustc do the hashing then we get that initial hash (which I was expecting to be expensive) for free. @bjorn3 mentioned that we should extend this so that we don't solely rely on mtime checking for binary depedenencies too. This seems reasonable though I've not figured out yet if there's some cunning way that those hashes can be gained for 'free' also - that would seem to be too good to be true, but idk, rustc might happen to have that data already somewhere. |
In terms of trade-offs so far it looks very positive - If we can get the initial hashes for free and we only attempt to hash a file if the mtimes are different but the file size is the same then there's a high probability that that file is exactly the same and thus it would be a significant improvement on reducing recompiles.
(I guess the worst worst case is that all files have been touched but only the last file has been modified in a way that didn't change the size. - Thanks that's a good point: we should try and dirty the unit cheaply via a file size mismatch before having to do any hash checks. I'm assuming that hashing the input files is always going to be considerably cheaper than doing the compile.) |
For binary dependencies there is already the SVH. This is not stored in a stable place though. There are two ways I can think of to get a binary dependency hash. The first is to define that the SVH is stored in a stable location. This would require cargo to include a ar archive and object file reader though. The second is to hash the whole file while reading the metadata. This would result in a performance decrease though, as we currently use mmap to prevent reading parts of the binary dependencies and even parts of the actual metadata that we don't need to compile the current crate. |
Ok updated to take advantage of rustc's SourceFile precomputed hashes. Next time to explore bin hashing. |
I worry reproducing a SVH would require quite a bit of parsing compared to md5. Potentially we could use the fact that cargo only refences parts of files as a benefit. - For the parts of the files that cargo does read, would there be things in there very likely to change if the other (unread) parts of those file changed? If so we could store a hash + range(s) of which parts of the file contributed to the hash. I.e. leave instructions as to how one could re-create the hash. |
Ah, my bad, we don't need to recompute a SVH, just compare already created SVHs. As long as fishing the SVH out of a binary then we're good. (I've found out that hashes are inserted when linked with /Brepro for mach-o/win but not for linux.) If we embed the SVH then we have a single consistent mechanism). Comparing SVHs would be much cheaper than hashing files and as a bonus it fixes rust-lang/rust#73917 . |
You can use the |
Svh are never part of the filename as far as I know. |
I might have got the wrong end of the stick but it looks like svh is the hash part that is in quite a few of the filenames: https://github.com/rust-lang/rust/blob/02fe30971ef397bcff3460a9aaf175e0810c2c90/compiler/rustc_incremental/src/persist/fs.rs#L37 - I'm thinking if the svh hash is in the dependencie's filename then as long as that file exists it really ought to contain the right contents. If we can rely on that then we only have to worry about the few bin files without hashes in their filenames. EDIT: I had got the wrong end of the stick and instead we can put the svh inside the .rlib in the lib.rmeta filename. |
The svh is only the hash part for incremental compilation directories. In all other cases it is a Cargo calculated hash passed to rustc using |
☔ The latest upstream changes (presumably #8778) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
Ah that's better: cx.add_used_global(llglobal); -- that's got me moving again. |
Hmm, now I have more symbols than I can shake a stick at - I have one for each code gen unit, for each crate. I can just create it for the first code-gen unit, but still seems to be one per crate - maybe I can only do it for the current crate. |
For dylibs there is a separate codegen unit for metadata. You can add the symbol there. For executables you can add it to the main shim generation. For rlibs you can add a separate archive member. |
Tests-wise I think we want to run all those freshness tests with content hashing turned on. |
Rust analyzer fresh build when moving the target dir to upset the mtimes currently builds fresh (which is great) but takes 6-12s to figure out everything is fresh. This is much better than a full 2m30 build that would otherwise happen. I suspect where we do have to hash files we should be doing that in parallel. |
@bjorn3 both rustc and cargo PRs are back in sync now. Seems like things are slightly twisted on the hashes front. I think reversing the slices will probably untwist it, but what's there is consistent and works for findshlibs. Now that sha256 has been added in (in master), I'm more minded to plump for 256 rather than 512 as that will tidy the code up a little. |
.files | ||
.iter() | ||
.find(|reference| *dep_in == reference.path); | ||
if let Some(reference) = ref_file { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: no else here marking it as stale.
☔ The latest upstream changes (presumably #8954) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
Ping @gilescope, are you still interested in moving this forward? Is there somewhere you are stuck or need help with? |
I took a long break parsing integers but finally got my atoi_radix10 crate
out the door. It’s time to crack on with this and get it over the line. My
goal is to get it done before August because at least that way it won’t
have taken a year.
…On Thu, 1 Jul 2021 at 01:56, Eric Huss ***@***.***> wrote:
Ping @gilescope <https://github.com/gilescope>, are you still interested
in moving this forward? Is there somewhere you are stuck or need help with?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8623 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGEJCB6RYKSRIMWMB2CANDTVO4MJANCNFSM4QA5EKGQ>
.
|
Sorry still intend on breathing life back into this just life is getting in the way. |
I'm going to close this due to inactivity. Unfortunately at this time we don't have the capacity to review major changes like this, but this is still a feature that we would love to see happen someday. If you are interested in resuming the work, please reach out to see if we can accept help with working on it. |
This adds the unstable option
-Zhash-tracking
that uses content hashes and file sizes as well as mtimes to prevent compile cascades when mtimes are messed with but the content is unchanged (Fixes: #6529 ). This paves the way towards fixing #5918.It works even better with
-Zbinary-dep-depinfo
.It depends on rust-lang/rust#75594 with the following enabling rustc changes:
.rmeta
now has SVH in uncompressed header as well as rust version..rlib
previously contained alib.rmeta
file - now renamed tolib-crate-name-svh.rmeta
.d
dependency files contain expected file size and hash.This is almost zero cost when mtimes are up-to-date (rustc already hashed everything it needed to but didn't expose this to cargo). Cost is only incurred when mtime checks fail and typically the filesize check means we can skip a more expensive hash check.
Worst case: if the files mtimes are not up to date and files are different but the same size then it will take cargo slightly longer to detect this and drop back to a full rebuild.
I appreciate that not everyone might be sold on content hashing right now, but it would be great to let them try the functionallity on nightly to see whether it makes some people's lives easier in the field.
Testing:
Still todo:
Unresolved questions: