Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rust-version field to the database and index #6267

Merged
merged 8 commits into from
Apr 26, 2023

Conversation

cassaundra
Copy link
Contributor

@cassaundra cassaundra commented Apr 2, 2023

Add rust-version to the index to open the way for experimenting with MSRV policies like an MSRV-aware resolvers (see rust-lang/cargo#9930 ).

I have also prepared a patch for cargo publish which includes this metadata in publish requests.

This change is fully backwards compatible with existing metadata in the index as well as older cargo clients which do not have the publish patch applied.

CC @epage

@cassaundra cassaundra changed the title Add rust-version field to the index Draft: Add rust-version field to the index Apr 2, 2023
@cassaundra cassaundra changed the title Draft: Add rust-version field to the index Add rust-version field to the index Apr 2, 2023
@cassaundra cassaundra marked this pull request as draft April 2, 2023 22:27
@cassaundra
Copy link
Contributor Author

cassaundra commented Apr 2, 2023

Few things that need to be done that I forgot about:

  • Use semver::VersionReq instead of String for rust_version fields.
  • Fix the tests.

@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels Apr 3, 2023
@cassaundra
Copy link
Contributor Author

Fixed the tests. Did not end up storing rust_version as a semver::VersionReq internally since it is already validated by EncodableRustVersion. Ready for review.

@cassaundra cassaundra marked this pull request as ready for review April 5, 2023 01:21
@Turbo87
Copy link
Member

Turbo87 commented Apr 5, 2023

Ready for review.

since this is a significant change to the index and to the publish metadata I'm wondering if an RFC for this would be useful. once it's added to the index it would be hard to remove again so it might be valuable to get some up-front thoughts on this change before we implement it.

/cc @epage

@epage
Copy link

epage commented Apr 5, 2023

since this is a significant change to the index and to the publish metadata I'm wondering if an RFC for this would be useful. once it's added to the index it would be hard to remove again so it might be valuable to get some up-front thoughts on this change before we implement it.

While I've not followed past index changes, I'm a bit surprised by this.

In part its because somehow I got the impression that adding manifest fields to the index was a more straightforward thing and was more driven by the cargo team than crates.io. Granted, showing this field in the frontend would be great!

Also in part its not clear to me what this would gain from the RFC process. The field for Cargo.toml had an RFC and we've had a lot of experience with it since then. This carries forward the syntactic verification (literally copied from cargo) but leaves out the semantic verification (aligns with Cargo.tomls edition, not newer than stable). I'm not really seeing much room for value gained from wider input, more in-depth design, or an exploration of alternatives beyond what a PR discussion would provide.

I could see feeling this should be bundled in the RFC that will use this field but I feel this change would be valuable to have before then to experiment with what behavior we want to put through RFCs and change proposals. This would be similar to how the compiler team implements some unstable features before writing an RFC. While this subset of the work would be insta-stable, I feel like we are confident enough in the value and design to go forward with that. We also wanted this done ahead of that RFC / change-proposal work so the Index would start to have some real world data for us to use in evaluating the designs.

I can definitely see doing either a T-cargo or T-crates.io (or both) FCP on this PR as a check on all of this of course.

@Turbo87
Copy link
Member

Turbo87 commented Apr 5, 2023

I can definitely see doing either a T-cargo or T-crates.io (or both) FCP on this PR as a check on all of this of course.

sounds like a reasonable proposal. I just felt a bit uneasy about just merging this without the relevant teams being aware and agreeing to the change :)

@epage
Copy link

epage commented Apr 5, 2023

@cassaundra Turbo87 and I hashed this out in DMs. We'll be running informal FCPs among both T-cargo and T-crates.io. We'll get back to you with the results.

@Turbo87
Copy link
Member

Turbo87 commented Apr 6, 2023

one thing that came to my mind today is whether we can mark this new field as experimental for now in:

that way we can clearly communicate that this field might be removed again if it turns out that it isn't useful after all. once the resolver is successfully using it we can still stabilize it together with the stabilization of the resolver feature.

@epage
Copy link

epage commented Apr 6, 2023

Once cargo starts publishing to this field, we won't be able to remove it.

@Turbo87
Copy link
Member

Turbo87 commented Apr 6, 2023

Once cargo starts publishing to this field, we won't be able to remove it.

why not?

@Turbo87
Copy link
Member

Turbo87 commented Apr 6, 2023

also, now that I think about it, why don't we read the rust-version field directly from the Cargo.toml file in the .crate file instead of relying on the metadata that cargo sends to crates.io? 🤔

@epage
Copy link

epage commented Apr 6, 2023

We can remove the field from the index itself but we will always have to deal with old cargo versions posting the rust-version.

I also see this unlikely.

  • A MSRV-aware resolver is more of a question of when, and not if. The reason we haven't moved forward with it yet is we didn't think it was possible to do it well with the current resolver but now we think we have a plan
  • Even if we don't do this for the resolver yet, we can do this for cargo add and cargo install when choosing what top-level crate version to use
  • I'd even love for this field to be shown in the frontend, at least when browsing all the versions of crate. If nothing else, that is a big help in the short term for people to identify what is a safe version to downgrade to to maintain their MSRV.

Also, as an update, the cargo team is at 5 yes and 2 who haven't responded yet (one is on vacation)

@epage
Copy link

epage commented Apr 6, 2023

also, now that I think about it, why don't we read the rust-version field directly from the Cargo.toml file in the .crate file instead of relying on the metadata that cargo sends to crates.io? 🤔

That would require us to download all .crate files for a given crate, extract the manifest, and then read that read. A large role of the index is so we can query data about packages without doing that.

@Turbo87
Copy link
Member

Turbo87 commented Apr 6, 2023

That would require us to download all .crate files for a given crate, extract the manifest, and then read that read. A large role of the index is so we can query data about packages without doing that.

sorry, I meant: why don't we (the crates.io server) read the rust-version field directly from the Cargo.toml file in the .crate file instead of relying on the metadata that cargo sends to crates.io?

we would still save the field in the index, but we wouldn't need to change cargo to send us the value in the publish metadata.

@epage
Copy link

epage commented Apr 6, 2023

The main advantage I see to extracting it directly is we'd immediately get support for cargo 1.54+ rather than 1.72+.

imo while this means we don't have any compatibility issues for considering this field experimental, I see that as an aside as I don't think the cargo team is treating it as such.

We'd likely still need to validate the field (while cargo verified it, do we trust that the uploader was cargo?). This means we'd need to do it synchronously rather than asynchronously so we can report the error to the user. What is the limit on how much synchronous processing crates.io should do before pushing this off into the queue?

This also runs into the tech debt issue of "which change has to pay for it". The first time we do this, it will be a larger effort that each individual request seems too trivial to justify it.

Something that limits the pay off of doing this work is that not all fields can be trivially understood from the manifest. For example, cargo does not auto-populate the various inferred [[test]], [[bin]], [lib], and [[bench]] targets (maybe it should?). I say this as the only concrete index change I'm aware of being considered is to be able know if a package has a bin or a lib target (e.g. #5882, rust-lang/cargo#11803, rust-lang/cargo#4682)

@Turbo87
Copy link
Member

Turbo87 commented Apr 6, 2023

This also runs into the tech debt issue of "which change has to pay for it". The first time we do this, it will be a larger effort that each individual request seems too trivial to justify it.

absolutely, though in the long-term I'd like to completely get rid of the metadata and rely only on the data in the Cargo.toml/.crate file. at the moment there is not much stopping people from creating an publish HTTP request with metadata that does not match the contents of the .crate file. we do some basic validation, but not quite as much as we probably should.

The main advantage I see to extracting it directly is we'd immediately get support for cargo 1.54+ rather than 1.72+.

not just that, but if we can read it from the Cargo.toml directly it would also become easier to backfill for existing releases.

What is the limit on how much synchronous processing crates.io should do before pushing this off into the queue?

if we can limit it to just reading the Cargo.toml and the VCS info file (which we already read) then I would assume the overhead should be relatively small. things like README extraction and rendering can be deferred to the background worker.

I say this as the only concrete index change I'm aware of being considered is to be able know if a package has a bin or a lib target

yeah, that part is definitely a bit more tricky. how do you infer those fields? just by looking if a src/bin/foo.rs or similar exists? if we can infer it by just looking at the paths inside the tarball then we should be able to do it ahead of the background worker.

@Eh2406
Copy link
Contributor

Eh2406 commented Apr 11, 2023

I am in support of reading the data from the .crate file. I encouraged @jonhoo to make https://github.com/jonhoo/cargo-index-transit to prepare us for that change. But I think that discussion is entarys orthogonal to adding this field.

@cassaundra
Copy link
Contributor Author

cassaundra commented Apr 25, 2023

Finished rewriting the first implementation so that it now extracts the information from the tarball. I find it a bit weird having a single exception for the field, but I'll leave that to others to decide.

(edit: not complaining though, would be happy to migrate the rest of them after this!)

@cassaundra cassaundra force-pushed the rust-version branch 2 times, most recently from ee48fda to 523c14d Compare April 25, 2023 23:41
@epage
Copy link

epage commented Apr 25, 2023

Finished rewriting the first implementation so that it now extracts the information from the tarball. I find it a bit weird having a single exception for the field, but I'll leave that to others to decide.

You have to start somewhere when transitioning designs / tackling tech debt. Its easy to say "it should be clean; let's do it all at once" but that makes the upfront cost extremely high, making it more likely to be pushed off. Each time it gets pushed off, the amount of code using the old design grows, increasing the cost. By doing this small amount, we are opening the door for new work to use this design and for us to retrofit old work to it as well.

@LawnGnome
Copy link
Contributor

Finished rewriting the first implementation so that it now extracts the information from the tarball. I find it a bit weird having a single exception for the field, but I'll leave that to others to decide.

To back up @epage's point above, I'm literally prototyping some security-oriented work this week that is also going to have to extract information from the tarball, so this probably won't be a single exception for very long.

Copy link

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ended up being fairly small of a change (once its all figured out)

@Turbo87 Turbo87 changed the title Add rust-version field to the index Add rust-version field to the database and index Apr 26, 2023
Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split up the single commit into a couple more to make the review a bit more straight-forward, but overall this looks good to me. nice work! :)

epage added a commit to epage/cargo that referenced this pull request Apr 26, 2023
epage added a commit to epage/cargo that referenced this pull request Apr 26, 2023
bors added a commit to rust-lang/cargo that referenced this pull request Apr 26, 2023
docs(ref): Specify 'rust_version' in Index format

See rust-lang/crates.io#6267
charles-r-earp pushed a commit to charles-r-earp/cargo that referenced this pull request May 3, 2023
epage added a commit to epage/rust-crates-index that referenced this pull request May 11, 2023
This was added to the index in rust-lang/crates.io#6267.  crates.io is
automatically injecting it into the index, even without using the
nightly cargo that supports it.

My plan is to use this to make cargo-upgrade more MSRV aware (in version
reqs despite the resolver not yet being MSRV aware).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants