-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema field and features2
to the index.
#9161
Conversation
r? @Eh2406 (rust-highfive has picked a reviewer for you, use r? to override) |
I'm glad I took some more time with this. The original implementation had the |
To make sure I understand right, was the bug you caught here that an older Cargo would cache a list of index entries without v2 ones, but then a newer Cargo would reread the cache and not see the newer v2 entries? If so we may want to factor in the maximum understood version into the cache format and if you alternate back-and-forth Cargo versions it rewrites the cache? Otherwise this looks good to me. I'd be happy to go ahead and land this and start to let this propagate. I do still think that crates.io will need to change, too. At the very least crates.io's feature validation needs to be updated for the new syntaxes. |
The issue was that in #9111, the schema check was done before the summary was saved to the cache (here). If #9111 was merged, and these v2 entries were in the index, those would not be included in the cache. Then, if you ran with This also caused a problem with running with debug assertions. If the index had a v2 entry, and you ran an older cargo (say 1.49), that older cargo would place the v2 entry in the cache. Then, if you ran a cargo after #9111, that cargo would filter out the v2 entries, and then compare that against the cache, which won't match, and it panicked. Here, the filtering is done after caching, so it should work with all versions, and avoids the debug-assertion. I was thinking of backporting the first commit (adding the |
Ok looking at this again, to make sure I write it down again as well my main concern is protecting against what I believe is a preexisting bug in Cargo. This bug is where an old Cargo parses an index entry, creates a cache entry, and then a future version of Cargo reads the cache, skipping the index entirely. In this situation I think that any lines in the registry that the older version of Cargo failed to understand would be missing from the future Cargo's knowledge (since those entries not understood are not cached). I think it might be good to perhaps go ahead and factor in the index version format into the cache entry header? That way usage of old/new Cargo would thrash but I'm not too worried about that, I'm mostly worried about correctness in that a newer version of Cargo should always see all the index entries. |
Yea, that makes sense. To be clear, the concern is that one of these lines could return an error for whatever reason, and it would be good to protect against that? Is the following patch roughly what you're thinking of? diff --git a/src/cargo/sources/registry/index.rs b/src/cargo/sources/registry/index.rs
index 83850c592..11e6da265 100644
--- a/src/cargo/sources/registry/index.rs
+++ b/src/cargo/sources/registry/index.rs
@@ -68,7 +68,7 @@
use crate::core::dependency::Dependency;
use crate::core::{PackageId, SourceId, Summary};
-use crate::sources::registry::{RegistryData, RegistryPackage};
+use crate::sources::registry::{RegistryData, RegistryPackage, INDEX_V_MAX};
use crate::util::interning::InternedString;
use crate::util::paths;
use crate::util::{internal, CargoResult, Config, Filesystem, ToSemver};
@@ -309,7 +309,7 @@ impl<'cfg> RegistryIndex<'cfg> {
// necessary.
let raw_data = &summaries.raw_data;
let max_version = if namespaced_features || weak_dep_features {
- 2
+ INDEX_V_MAX
} else {
1
};
@@ -678,7 +678,7 @@ impl Summaries {
// sha lets us know when the file needs to be regenerated (it needs regeneration
// whenever the index itself updates).
-const CURRENT_CACHE_VERSION: u8 = 1;
+const CURRENT_CACHE_VERSION: u8 = 1 + INDEX_V_MAX as u8;
impl<'a> SummariesCache<'a> {
fn parse(data: &'a [u8], last_index_update: &str) -> CargoResult<SummariesCache<'a>> {
diff --git a/src/cargo/sources/registry/mod.rs b/src/cargo/sources/registry/mod.rs
index c6e50bf53..d43a3d26d 100644
--- a/src/cargo/sources/registry/mod.rs
+++ b/src/cargo/sources/registry/mod.rs
@@ -250,6 +250,10 @@ pub struct RegistryConfig {
pub api: Option<String>,
}
+/// The maximum version of the `v` field in the index this version of cargo
+/// understands.
+pub(crate) const INDEX_V_MAX: u32 = 2;
+
/// A single line in the index representing a single version of a package.
#[derive(Deserialize)]
pub struct RegistryPackage<'a> { Or, what do you think of changing it so that if |
Oh I'm not worried about one of those lines failing, more specifically what I'm worried about is the below sequence of events. To reiterate though this is a preexisting bug and not the fault of this PR, it's something we ideally should have fixed long ago!
My worry is that cargo NEW fails to see the entries that OLD failed to parse. Nothing failed here and nothing was corrupted, it was just unfortunate that the OLD version ran first and cached bad results. |
As for a fix I think what you gisted would fix things, but I think it's probably best to have a leading version byte followed by a maximum understood version byte. I'm a little wary of having the sum, not that I can think of anything concretely going wrong but it feels better to have them separate. |
This is intended to help prevent the following scenario from happening: 1. Old cargo builds an index cache. 2. Old cargo finds an index entry it cannot parse, skipping it, and saving the cache without the entry. 3. New cargo loads the cache with the missing entry, and never sees the new entries that it understands. This may result in more cache thrashing, but that seems better than having new cargos missing entries.
Ok, I pushed a commit that adds the version field to the cache. I still feel like the code is brittle (I added some warning comments to that effect). I was thinking that if |
src/cargo/sources/registry/index.rs
Outdated
.get(..4) | ||
.ok_or_else(|| anyhow::anyhow!("cache expected 4 bytes for index version"))?; | ||
let index_v = u32::from_le_bytes(index_v_bytes.try_into().unwrap()); | ||
if index_v > INDEX_V_MAX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this check may actually want to be !=
instead of >
perhaps? If Cargo version 3 reads a cache file from Cargo version 2, I think that's the case I was worried about? (where version 2 didn't cache any version 3 entries in this summary file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... 😡 😡 sorry, been making several elementary mistakes.
Nah I definitely agree this is a bit brittle. One possibility would be to, while testing Cargo itself, we assert there are no errors maybe? Not that that really adds a whole ton of confidence... |
The whole point is that when loading a cache from an earlier version, that cache may be missing new entries. So don't load older caches.
I don't think the testsuite will discover much, since it doesn't test older formats. There is a debug assert that causes it to compare the cache to loading it, which is how I got started down this quest. But that only triggered when I was manually running older cargos with |
@bors: r+ |
📌 Commit f258569 has been approved by |
☀️ Test successful - checks-actions |
[beta] backports for 1.51 Beta backports for the following: * Fix panic with doc collision orphan. (#9142) * This is an important regression that is fairly easy to hit. * Do not exit prematurely if anything failed installing. (#9185) * This is not a regression, but I think an important fix. * Add schema field to the index (#9161) * This is only the first commit from the PR which checks for the `v` field in the index, and skips entries that are not understood. The reason to backport is to get this in as early as possible so that if we do decide to start using it in the future, it works as early as possible. This otherwise doesn't do anything, so I think it should be safe. * Fix warnings of the new non_fmt_panic lint (#9148) * Fixes CI for a new warning in nightly.
[beta] Update cargo Backport of rust-lang/cargo#9196: * Fix panic with doc collision orphan. (rust-lang/cargo#9142) * Do not exit prematurely if anything failed installing. (rust-lang/cargo#9185) * Add schema field to the index (rust-lang/cargo#9161) * Fix warnings of the new non_fmt_panic lint (rust-lang/cargo#9148)
Update cargo 11 commits in bf5a5d5e5d3ae842a63bfce6d070dfd438cf6070..572e201536dc2e4920346e28037b63c0f4d88b3c 2021-02-18 15:49:14 +0000 to 2021-02-24 16:51:20 +0000 - Pass the error message format to rustdoc (rust-lang/cargo#9128) - Fix test target_in_environment_contains_lower_case (rust-lang/cargo#9203) - Fix hang on broken stderr. (rust-lang/cargo#9201) - Make it more clear which module is being tested when running cargo test (rust-lang/cargo#9195) - Updates to edition handling. (rust-lang/cargo#9184) - Add --cfg and --rustc-cfg flags to output compiler configuration (rust-lang/cargo#9002) - Run rustdoc doctests relative to the workspace (rust-lang/cargo#9105) - Add support for [env] section in .cargo/config.toml (rust-lang/cargo#9175) - Add schema field and `features2` to the index. (rust-lang/cargo#9161) - Document the default location where cargo install emitting build artifacts (rust-lang/cargo#9189) - Do not exit prematurely if anything failed installing. (rust-lang/cargo#9185)
This adds a
v
field to the index which indicates a format version for an index entry. If Cargo encounters a version newer than it understands, it will ignore those entries. This makes it safer to make changes to the index entries (such as adding new things), and not need to worry about how older cargos will react to it. In particular, this will make it safer to runcargo update
on older versions if we ever decide to add new things to the index.Currently this is not written anywhere, and is intended as a safety guard for the future. For now I will leave it undocumented until we actually decide to start using it.
This also moves the new syntax for namespaced features and weak dependency features into a new field ("features2") in the index. This is necessary to avoid breaking Cargo versions older than 1.19, which fail to parse the index even if there is a Cargo.lock file.
It is intended that only crates.io will bother with creating this field. Other registries don't need to bother, since they generally don't support Cargo older than 1.19.
I'm uncertain exactly when we should try to update crates.io to start accepting this, as that is a somewhat permanent decision.