-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: trace config
[env]
table in dep-info.
#14701
Conversation
3d35db0
to
1297f40
Compare
// | ||
// For issue#13280, We trace env vars that are defined in the `config.toml`. | ||
on_disk_info.env.retain(|(key, _)| { | ||
!rustc_cmd.get_envs().contains_key(key) || key == CARGO_ENV || config_envs.contains(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, though here is a caveat of this solution:
If people define a Cargo internal env in [env]
table, such as
[env]
CARGO_MANIFEST_DIR = "whatever"
Before this patch, Cargo's depinfo won't track CARGO_MANIFEST_DIR
. After it tracks and records as CARGO_MANIFEST_DIR=/path/to/manifest-dir
.
I wouldn't advertise Cargo supports tracking internal envs, but this could be a hacky workaround for #14154 — by putting an internal env key with random value in [env]
, a user can force Cargo to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One clumsy solution is to filter these internal_envs
out of [env]
table
let internal_envs = LazyCell::new(|| HashSet::from([
"CARGO",
"CARGO_MANIFEST_DIR",
"CARGO_MANIFEST_PATH",
"CARGO_PKG_VERSION",
"CARGO_PKG_VERSION_MAJOR",
"CARGO_PKG_VERSION_MINOR",
"CARGO_PKG_VERSION_PATCH",
"CARGO_PKG_VERSION_PRE",
"CARGO_PKG_AUTHORS",
"CARGO_PKG_NAME",
"CARGO_PKG_DESCRIPTION",
"CARGO_PKG_HOMEPAGE",
"CARGO_PKG_REPOSITORY",
"CARGO_PKG_LICENSE",
"CARGO_PKG_LICENSE_FILE",
"CARGO_PKG_RUST_VERSION",
"CARGO_PKG_README",
"CARGO_CRATE_NAME",
"CARGO_BIN_NAME",
"OUT_DIR",
"CARGO_PRIMARY_PACKAGE",
"CARGO_TARGET_TMPDIR",
"CARGO_RUSTC_CURRENT_DIR",
paths::dylib_path_envvar(),
]));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it is actually doing #14701 (comment) 😆.
src/cargo/core/compiler/mod.rs
Outdated
let config_envs: HashSet<String> = build_runner | ||
.bcx | ||
.gctx | ||
.env_config()? | ||
.keys() | ||
.cloned() | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clone happens for every single unit of work. It may hurt the performance a bit when the [env]
table is large. That's a tradeoff.
An alternative is using Arc
to avoid cloning the entire HashSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is we could have a set of internal env + a set of env from cargo::rustc-env
build script invocation. dep_info should first remove those and anything left is what we're interested in (including [env]
).
See the PoC
- Can avoid cloning
cargo::rustc-env
by acquiring mutex again, though this is just a PoC so didn't do that. - Should also need to filter out
CARGO_BIN_<exe>
- Need an
debug_assertion
to check exhaustiveness ofinternal_envs
in rustc invocation.
diff --git a/src/cargo/core/compiler/fingerprint/mod.rs b/src/cargo/core/compiler/fingerprint/mod.rs
index e6fff90932a..03489d2c99a 100644
--- a/src/cargo/core/compiler/fingerprint/mod.rs
+++ b/src/cargo/core/compiler/fingerprint/mod.rs
@@ -361,8 +361,10 @@
mod dirty_reason;
+use std::cell::LazyCell;
use std::collections::hash_map::{Entry, HashMap};
+use std::collections::HashSet;
use std::env;
use std::fmt::{self, Display};
use std::fs::{self, File};
@@ -2130,7 +2132,7 @@ pub fn translate_dep_info(
rustc_cwd: &Path,
pkg_root: &Path,
target_root: &Path,
- rustc_cmd: &ProcessBuilder,
+ envs_from_build_script_output: &HashSet<String>,
allow_package: bool,
) -> CargoResult<()> {
let depinfo = parse_rustc_dep_info(rustc_dep_info)?;
@@ -2140,6 +2142,33 @@ pub fn translate_dep_info(
let mut on_disk_info = EncodedDepInfo::default();
on_disk_info.env = depinfo.env;
+ let internal_envs = LazyCell::new(|| HashSet::from([
+ "CARGO",
+ "CARGO_MANIFEST_DIR",
+ "CARGO_MANIFEST_PATH",
+ "CARGO_PKG_VERSION",
+ "CARGO_PKG_VERSION_MAJOR",
+ "CARGO_PKG_VERSION_MINOR",
+ "CARGO_PKG_VERSION_PATCH",
+ "CARGO_PKG_VERSION_PRE",
+ "CARGO_PKG_AUTHORS",
+ "CARGO_PKG_NAME",
+ "CARGO_PKG_DESCRIPTION",
+ "CARGO_PKG_HOMEPAGE",
+ "CARGO_PKG_REPOSITORY",
+ "CARGO_PKG_LICENSE",
+ "CARGO_PKG_LICENSE_FILE",
+ "CARGO_PKG_RUST_VERSION",
+ "CARGO_PKG_README",
+ "CARGO_CRATE_NAME",
+ "CARGO_BIN_NAME",
+ "OUT_DIR",
+ "CARGO_PRIMARY_PACKAGE",
+ "CARGO_TARGET_TMPDIR",
+ "CARGO_RUSTC_CURRENT_DIR",
+ paths::dylib_path_envvar(),
+ ]));
+
// This is a bit of a tricky statement, but here we're *removing* the
// dependency on environment variables that were defined specifically for
// the command itself. Environment variables returned by `get_envs` includes
@@ -2170,7 +2199,10 @@ pub fn translate_dep_info(
// not tracked elsewhere in the fingerprint.
on_disk_info
.env
- .retain(|(key, _)| !rustc_cmd.get_envs().contains_key(key) || key == CARGO_ENV);
+ .retain(|(key, _)|
+ !internal_envs.contains(key.as_str()) && !envs_from_build_script_output.contains(key)
+ || key == CARGO_ENV
+ );
let serialize_path = |file| {
// The path may be absolute or relative, canonical or not. Make sure
diff --git a/src/cargo/core/compiler/mod.rs b/src/cargo/core/compiler/mod.rs
index ee5fdbbecd6..2f3e547c762 100644
--- a/src/cargo/core/compiler/mod.rs
+++ b/src/cargo/core/compiler/mod.rs
@@ -337,8 +337,11 @@ fn rustc(
// directory is present.
if artifact.is_true() {
paths::create_dir_all(&root)?;
+
}
+ let mut envs_from_build_script_output = HashSet::new();
+
// Only at runtime have we discovered what the extra -L and -l
// arguments are for native libraries, so we process those here. We
// also need to be sure to add any -L paths for our plugins to the
@@ -359,7 +362,7 @@ fn rustc(
)?;
add_plugin_deps(&mut rustc, &script_outputs, &build_scripts, &root_output)?;
}
- add_custom_flags(&mut rustc, &script_outputs, script_metadata)?;
+ add_custom_flags(&mut rustc, &script_outputs, script_metadata, &mut envs_from_build_script_output)?;
}
for output in outputs.iter() {
@@ -456,7 +459,7 @@ fn rustc(
&cwd,
&pkg_root,
&target_dir,
- &rustc,
+ &envs_from_build_script_output,
// Do not track source files in the fingerprint for registry dependencies.
is_local,
)
@@ -880,6 +883,7 @@ fn rustdoc(build_runner: &mut BuildRunner<'_, '_>, unit: &Unit) -> CargoResult<W
&mut rustdoc,
&build_script_outputs.lock().unwrap(),
script_metadata,
+ &mut HashSet::new(),
)?;
// Add the output of scraped examples to the rustdoc command.
@@ -1525,6 +1529,7 @@ fn add_custom_flags(
cmd: &mut ProcessBuilder,
build_script_outputs: &BuildScriptOutputs,
metadata: Option<Metadata>,
+ envs: &mut HashSet<String>,
) -> CargoResult<()> {
if let Some(metadata) = metadata {
if let Some(output) = build_script_outputs.get(metadata) {
@@ -1536,6 +1541,7 @@ fn add_custom_flags(
}
for (name, value) in output.env.iter() {
cmd.env(name, value);
+ envs.insert(name.to_owned());
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very appreciate your examples. I'm sorry, I'm not clear on some questions
Should also need to filter out
CARGO_BIN_<exe>
Does CARGO_BIN_<exe>
means that environment vars prefixed with CARGO_BIN
except CARGO_BIN_NAME
?
I found these enivironments that meet the contions
CARGO_BIN_EXE_<name>
CARGO_<ARTIFACT-TYPE>_DIR_<DEP>
CARGO_<ARTIFACT-TYPE>_FILE_<DEP>_<NAME>
See,
These environments vars has placehold parts, so that we just need to strip them, and it is imporsible to check exhaustiveness of them in rustc invocation.
?
Need an debug_assertion to check exhaustiveness of internal_envs in rustc invocation.
Does it means that check all environment variables in internal_var are present in rust innovations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the original purpose of excluding Cargo internal environment variables from Cargo's dep-info is just they are already tracked by Cargo, so not necessary. (Though we know the statement is not entirely true according to #14154.)
If we choose to use a patch similar to the one I shared in #14701 (comment), it might become a headache for syncing between the hash set and env vars actually applied to rustc invocations. That's why I propose to check the exhaustiveness in debug_assertion
. That will help at least when we run cargo's test suite.
Does it means that check all environment variables in internal_var are present in rust innovations?
We'll need to do the other way round — check every env var in rustc_cmd.envs()
is an element in internal env var hash set. However, as you've noticed, dynamic ones are hard to track, and may need some more refactors. Yet I think here we focus on environment variables set for crates, not set for running build scripts. Environment variables for artifact-dependency do not seem to belong to this category?
At this stage, I am fine with Arc
on gctx.env_config
to reduce the clone cost, if the second approach looks too brittle and need more explorations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you explanations. The second approach seems require more preparation and prior knowledge for me.
Besides, I had apply the Arc on gctx.env_config
.
1297f40
to
e058c43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Only one tiny suggestion.
e058c43
to
2148392
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this patch!
|
||
"#]]) | ||
.with_stderr_data(str![[r#" | ||
[COMPILING] foo v0.5.0 ([ROOT]/foo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is funny that if you change the env config to ENV_TEST = { value = "from-config", force = true }
in this test, this cargo invocation triggers a rebuild, which I expect no.
[DIRTY] foo v0.5.0 ([ROOT]/foo): the environment variable ENV_TEST changed
Before this patch, it coinciddentally didn't trigger a rebuild because [env]
was not tracked in depinfo. After this rebuilds because [env]
doesn't take into account here when checking the old depinfo:
cargo/src/cargo/core/compiler/fingerprint/mod.rs
Lines 851 to 853 in 6f92aaa
} else { | |
gctx.get_env(key).ok() | |
}; |
I think we should add a test case covering force = true
scenario, and fix it. I have one solution in mind:
- In
GlobalContext::env_config
we filter out env vars that are notforce = true
and already exist in the env snapshot (GlobalContext::get_env
). That is,env_config()
always return the set of envs we need to apply. - Could add a doc comment for
env_config()
pointing out the returning is pre-filtered against env snapshot, not the literal table value in[env]
. - In addition, because of this change, we could remove this condition in
apply_env_config()
. - Also, everything, including the filter of disallowed envs, should be inside the
try_borrow_with
if possible to avoid non-necessary computation. Maybe we could go further that also resolve values inside it, so subsequent calls don't need to do it.
What do you think?
(I am afraid of missing any more thing. Environment variables in Cargo are a bit messy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry that I discovered this problem so late 🙇🏾.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo/src/cargo/core/compiler/fingerprint/mod.rs
Lines 851 to 853 in 6f92aaa
} else { | |
gctx.get_env(key).ok() | |
}; |
Thank your for pointing out these. Here shoudld also apply the contions to figure out which env change should trigger a rebuild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the PR so swiftly!
d126728
to
5bf7c92
Compare
Cargo fails to detect environment variable is newly set and rebuild if the environment variable is set in `config.toml`
5bf7c92
to
668d82e
Compare
@bors r+ Thanks for working on this along with us! Note that this fix has a caveat #14701 (comment). If a user sets cargo internal env like |
☀️ Test successful - checks-actions |
refactor(env): remove unnecessary clones ### What does this PR try to resolve? This was found during the review of #14701. Not every environment variable access requires an owned type. To be precise, most of our usages don't. ### How should we test and review this PR? Refactor only. Should not have any change in behavior. One drawback is that the fn signatures deviate from `std::env::var{_os}`.
Update cargo 14 commits in cf53cc54bb593b5ec3dc2be4b1702f50c36d24d5..e75214ea4936d2f2c909a71a1237042cc0e14b07 2024-10-18 13:56:15 +0000 to 2024-10-25 16:34:32 +0000 - refactor(env): remove unnecessary clones (rust-lang/cargo#14730) - test(install): Verify 2024 edition / resolver=3 doesn't affect resolution (rust-lang/cargo#14724) - Fix: trace `config` `[env]` table in dep-info. (rust-lang/cargo#14701) - Added unstable-schema generation for Cargo.toml (rust-lang/cargo#14683) - fix: add source replacement info when no matching package found (rust-lang/cargo#14715) - feat(complete): Include descriptions in zsh (rust-lang/cargo#14726) - refactor(fingerprint): avoid unnecessary fopen calls (rust-lang/cargo#14728) - docs(resolver): Make room for v3 resolver (rust-lang/cargo#14725) - test: add fixes in the sat resolver (rust-lang/cargo#14707) - docs(ci): Don't constrainty latest_deps job by MSRV (rust-lang/cargo#14711) - refactor: use `Iterator::is_sorted` (rust-lang/cargo#14702) - refactor(rustfix): minor refactors (rust-lang/cargo#14710) - chore(deps): update msrv (rust-lang/cargo#14705) - fix(renovate): Switch matchPackageNames to matchDepNames (rust-lang/cargo#14704)
What does this PR try to resolve?
The
env
defined inconfig.toml
or--config env
are stripped before recording in the dep-info file. The absence of the env table doesn't rebuild when env changedBy tracing the
env
table into thedep-info
file, If the envs changed then the rebuild can be triggered.Fixes #13280
How should we test and review this PR?
One commit add the test, the latter commit update the test through the fixes.
Additional information
The PR only fixed the
env
table changes can't trigger a rebuild, otherCARGO-like
envsdoesn't take into account.