Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cargo preduce dependency binaries to a shared directory for reuse #6437

Closed
wants to merge 6 commits into from

Conversation

38
Copy link

@38 38 commented Dec 14, 2018

Currently Cargo re-compiles all the dependencies for every package. But in fact, there's many dependency binaries can be shared across crates. In addition, once the target directory is cleaned, cargo will start to build all the dependencies again.

This change makes Cargo able to reuse the library binaries across the packages by setting environment variable CARGO_SHARED_TARGET_DIR. When this environment variable is set, the non-primary unit will produce output in CARGO_SHARED_TARGET_DIR and if the target is already in the directory cargo won't call rustc again.

The target name is determined by the name of the crate and the metadata of the output which also mixed the version, dependency versions, features and cfgs.

Related discussion: https://internals.rust-lang.org/t/idea-cargo-global-binary-cache/9002

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@38 38 changed the title Make cargo preduce dependency binaries to a shared directory for reuseG Make cargo preduce dependency binaries to a shared directory for reuse Dec 14, 2018
@dwijnand
Copy link
Member

dwijnand commented Dec 14, 2018

I think the solution we've promoted to achieve this is using sscache as a rustc wrapper:

https://doc.rust-lang.org/cargo/guide/build-cache.html

(edit: sscache is also mentioned in the associated thread, but only in passing.)

@matthiaskrgr
Copy link
Member

matthiaskrgr commented Dec 14, 2018

I think sccache is not able to share build artifacts between two different projects in different paths (which this pr is aimed at?)?
Because of this it also does not work with cargo install.

@matthiaskrgr
Copy link
Member

Hmm, it looks like build artifacts are share among directories even if different RUSTFLAGS were used?
This seems to me like it could cause problems.

mkdir /tmp/cache
git clone https://github.com/rust-lang/regex/ regex_1
git clone https://github.com/rust-lang/regex/ regex_2
cd regex_1
CARGO_SHARED_TARGET_DIR=/tmp/ RUSTFLAGS="-C opt-level=3" cargo build --release

    Updating crates.io index
   Compiling version_check v0.1.5
   Compiling libc v0.2.45
   Compiling cfg-if v0.1.6
   Compiling ucd-util v0.1.3
   Compiling lazy_static v1.2.0
   Compiling regex v1.1.0 (/tmp/regex_1)
   Compiling utf8-ranges v1.0.2
   Compiling regex-syntax v0.6.4 (/tmp/regex_1/regex-syntax)
   Compiling memchr v2.1.2
   Compiling thread_local v0.3.6
   Compiling aho-corasick v0.6.9
    Finished release [optimized + debuginfo] target(s) in 12.87s

cd ../regex_2

CARGO_SHARED_TARGET_DIR=/tmp/ RUSTFLAGS="-C opt-level=0" cargo build --release

   Compiling libc v0.2.45
      Cached version_check v0.1.5
      Cached cfg-if v0.1.6
   Compiling regex v1.1.0 (/tmp/regex_2)
      Cached lazy_static v1.2.0
      Cached ucd-util v0.1.3
      Cached utf8-ranges v1.0.2
   Compiling memchr v2.1.2
      Cached thread_local v0.3.6
      Cached regex-syntax v0.6.4 (/tmp/regex_2/regex-syntax)
      Cached aho-corasick v0.6.9
    Finished release [optimized + debuginfo] target(s) in 6.35s

@38
Copy link
Author

38 commented Dec 14, 2018

@matthiaskrgr Ah, I see, it seems metadata doesn't include the extra arguments passed to compiler.

I think there are two potential workaround here.

  1. Detect if the RUSTFLAGS presents, so do not use the shared dir or
  2. Use a different hash which includes the extra flags

I think the first workaround may be better, because once it allows to cache artifacts with extra flags, it may produce too many thing in the shared dir never being used again.

Just dig into the fingerprint code a little bit, seems extra RUSTFLAGS is the only param that is missing?

I am new to Cargo code please let me know if there's anything wrong.

--
[update] I made the change that checks if there is extra rustc flags. If yes, it just disable the logic.

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 15, 2018

This is a good goale, I look forward to finding a solution in this direction. Thank you for pushing to make this happen. Unfortunately, I don't know this part of the code well enough to comment on the implementation details. I have some questions about the bigger picture, that I think will need to be answered and documented before it can get stabilized.

  • How does this work with / compared to the stable CARGO_TARGET_DIR setting?
  • How does this work with / compared to the unstable --output-path argument?
  • What gets put in the shared cache? What is a primary_package? (This may be an existing concept that we need to document better.) How flexible is the design to us changing the heuristics of what is cached?
  • There has been some discussion of moving files that are required to build a dep but not required to use it into the system temp directory, is this compatible with a future change like that?

@38
Copy link
Author

38 commented Dec 16, 2018

Good questions. I have inlined answers.

* What gets put in the shared cache? What is a `primary_package`? (This may be an existing concept that we need to document better.) How flexible is the design to us changing the heuristics of what is cached?

It seems a existing concept, and I am new to Cargo, just pickup the term from the code.
Basically my understand is a primary unit is a target the user directly requires to build.
See cargo::core::compiler::context::Context::prepare_units for the code where I pick up this concept.

The CARGO_SHARED_TARGET_DIR will contains the non-primary binaries (libxxx-yyy.rlib) and dep file (xxx-yyy.d).

* How does this work with / compared to the stable `CARGO_TARGET_DIR` setting?

CARGO_TARGET_DIR is overriding the entire crate-root/target dir to another one. Everything including the build metadata, artifacts(include primary and non-primary) , dependencies will be placed into the CARGO_TARGET_DIR.

CARGO_SHARED_TARGET_DIR doesn't change the build layout. Instead it override the final artifacts (.rlib & .d) of non-primary units. It doesn't override where the primary units will be placed, where the fingerprint is kept, etc.

* How does this work with / compared to the unstable ` --output-path` argument?

For this one, --out-dir seems only copy the primary artifacts. But this change won't affect the primary artifacts, so it's different. And I have tried with the both param, seems works.

Here's the test run.

$ CARGO_SHARED_TARGET_DIR=/tmp/cargo-cache CARGO_TARGET_DIR=/tmp/cargo-output
./cargo build -Z unstable-options --out-dir /tmp
Finished dev [unoptimized + debuginfo] target(s) in 6.65s

$ ls /tmp/cargo-cache | head -n 10
aho_corasick-bb48e126e2a9e0e9.d
ansi_term-e48faae63542482c.d
arrayvec-37c2d8385716947c.d
atty-fdbf38cc66d9acea.d
backtrace-547747f97b55eb84.d
backtrace_sys-a929c6f96af80e07.d
bitflags-7eae9b6bc47269f3.d
bytesize-81052e79c2e769b1.d
cc-c8d1427754717194.d
cfg_if-f53c909d58d8a2e9.d


$ ls /tmp/cargo-output/ | head -n 10
debug/


$ ls /tmp/cargo
/tmp/cargo*
* There has been some discussion of moving files that are required to build a dep but not required to use it into the system `temp` directory, is this compatible with a future change like that?

I believe this won't cause any problem, since what this change did is just override the output location / dep artifacts location. So it won't impact anything related to tempfile.

@alexcrichton
Copy link
Member

Thanks for the PR here! Unfortunately though I think that this sort of functionality will require significant design work before moving forward with simply a PR. This PR is currently incorrect as RUSTFLAGS isn't the only value that affects the output artifact, there's a lot of settings which don't make their way into the file's hashed name (we have a whole module dedicated to doing this). All of these properties need to be compared to ensure the correct build is used.

You can emulate what would naturally happen with Cargo today by using CARGO_TARGET_DIR pointing to the same location. If something isn't cached in those two builds, then that's either a legitimate cache miss or a legitimate bug! An oft-forgotten part of builds is how debuginfo goes into the final aritfact and affects cache hits/misses.

@bors
Copy link
Contributor

bors commented Apr 25, 2019

☔ The latest upstream changes (presumably #6867) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Apr 25, 2019
@alexcrichton
Copy link
Member

I'm gonna go ahead and close this as it's been inactive for quite some time now, but it can of course be resubmitted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants