MCP: Alternate cargo freshness algorithm, unstable flag to annotate depinfo file with checksums and file sizes #765
Labels
major-change
A proposal to make a major change to rustc
major-change-accepted
A major change proposal that was accepted
T-compiler
Add this label so rfcbot knows to poll the compiler team
This is a major change proposal for something I've been working on for about a month. Here's some more context
Original cargo issue I'm trying to solve: rust-lang/cargo#6529
Cargo implementation PR: rust-lang/cargo#14137
Cargo tracking issue: rust-lang/cargo#14136
Rustc implementation PR (blocked on acceptance of this MCP): rust-lang/rust#126930
Proposal
The cargo project would like to experiment with the use of source file checksums to determine if a rust crate is fresh as an alternative to file mtimes. This is particularly valuable on systems with poor mtime accuracy, or otherwise poor mtime implementations. It's also very valuable in CI/CD, where mtimes frequently have no relation to the contents of the build cache.
It is generally not possible to lock files in a systematically reliable manner that can be trusted cross platform. This is especially true on unix systems. So in order to behave correctly cargo must expect that the input source files could be changed at any time, even in the middle of a build. This means that the moment at which you take a checksum is relevant. Otherwise the freshness algorithm experiences time-of-check time-of-use errors. Unfortunately it's not possible for cargo to create a list of which files need to be checked at the beginning of the build. That list is discovered over the duration of the build. Since rustc is the first to discover a file, it also ought to take the checksum and record it to the depinfo file so that cargo can use it later.
I'm not asking for a commitment to use checksums in every build, or even a commitment to make them available indefinitely. I just want to give the ecosystem a chance to experience this and experiment with it. So I'm proposing that rustc receive an unstable flag
--checksum-hash-algorithm=<algorithm>
that a nightly cargo can use on nightly rustc in order to have rustc write these checksums to comments in the dep-info makefile. Cargo would only use the rustc flag when an unstable cargo flag is present. Additionally, these lines would include the file length at the time which the checksum was recorded so that cargo can sometimes short circuit the freshness check. The format would go something like thisChecksum algorithm
I have no strong feelings on which checksum algorithm is used and have intentionally authored my code to allow for changes to the checksum algorithm later. Right now I've authored support for xxhash and sha256. The team has determined that a cryptographically secure algorithm is necessary, for correctness and security reasons. This means xxhash is not suitable. One alternative that hasn't yet been tried is the blake3 algorithm, which promises to be cryptographically secure and fast. This is mostly a performance question so I feel it should be informed by real world experience and profiling.
I also believe the algorithm used for these checksums should not be required to be the same as the checksums used for debug files. However, if the algorithm happens to match the one used for debug files, that checksum should not be computed twice.
Anticipated performance impact
Taking a checksum is not as fast as reading an mtime, especially for large files. So for totally fresh builds it is expected this may have a slight performance degradation. That's probably worth exchanging for the improved correctness of the freshness algorithm on systems with poor mtime implementations. It's worth noting this only represents a correctness improvement if the checksum algorithm effectively avoids checksum collisions. The time it takes to compute a checksum pales in comparison to the time it takes to perform a build. So this will probably still be a relatively short check.
Build scripts
This proposal has a known deficiency in relation to build scripts. Build scripts may ingest files from all over the system, and as of this writing won't provide a checksum prior to ingesting these files. Ideally, long term, I want build scripts to also compute and output these checksums. That would be made significantly easier if an official build script API were made available. This is being explored in rust-lang/cargo#12432. In the meantime, cargo will continue using mtimes for build script files.
Mentors or Reviewers
I don't have anyone in mind but I'm happy to accept volunteers.
Process
The main points of the Major Change Process are as follows:
@rustbot second
.-C flag
, then full team check-off is required.@rfcbot fcp merge
on either the MCP or the PR.You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
The text was updated successfully, but these errors were encountered: