-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
book: describe how the current resolver sometimes duplicates deps #11604
Conversation
This documents the behavior described in rust-lang#9029, where sometimes Cargo will duplicate a dep when it doesn't have to.
r? @ehuss (rustbot has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, this is a great example.
src/doc/src/reference/resolver.md
Outdated
@@ -7,8 +7,10 @@ result of the resolution is stored in the `Cargo.lock` file which "locks" the | |||
dependencies to specific versions, and keeps them fixed over time. | |||
|
|||
The resolver attempts to unify common dependencies while considering possibly | |||
conflicting requirements. The sections below provide some details on how these | |||
constraints are handled, and how to work with the resolver. | |||
conflicting requirements. Dependency resolution is, however, an NP-hard problem, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't think it's the NP-hard
nes of the problem that causes counterintuitive results. The heuristic comes in in deciding which solution, among the seemingly infinite, valid solutions to pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that they're strongly related if not strictly causative. Or, to put it another way, if the problem wasn't NP-hard, I suspect that resolution would produce counterintuitive results much more rarely, possibly never — but also it doesn't feel very helpful to speculate about a hypothetical different problem!
(addendum: In other words I feel like there's a relation between NP-hard-ness and "seemingly infinite" solution sets that do not admit a strict ranking, although I can think of counterexamples and am not very familiar with the relevant theory.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about your comment for three days, and sorry for the slow response, I think you are technically correct. One of the definitions of NP-hard is "exponential to find a solution, polynomial to check a solution". And what want for our example is a problem where "the solutions are so simple that people don't think about the algorithm" but where "there are nearly infinite number of possible solutions". So yes, these seem to be isomorphic definitions.
Which still leaves us with question about how we best communicate with a Rust user. Who is probably reading this section because they are frustrated that cargo gave them the "wrong" solution to a simple problem. (Where "wrong" can be "why did you bloat my binary with more than one version" or "why did you give me this ancient version just because it's somewhere else in my dependency tree".) This user may very well never have heard of NP-hard, I certainly had not when I started using Rust. Even if they have, it may not be a clear connection, as I demonstrated by taking three days to understand your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rephrased this bit (in response to @ehuss's comments as well). I do think that it would be illuminating if we can find a good way to introduce the idea that dependency resolution is not as easy as one might naively think.
src/doc/src/reference/resolver.md
Outdated
@@ -91,6 +93,31 @@ at the time of this writing) and Package B will use the greatest `0.6` release | |||
(`0.6.5` for example). This can lead to potential problems, see the | |||
[Version-incompatibility hazards] section for more details. | |||
|
|||
The resolver algorithm favors the greatest available versions of dependencies | |||
and has limited support for backtracking in the solutions it attempts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and has limited support for backtracking in the solutions it attempts." I don't think this is technically true. If there is no valid solution, it can backtrack as much as it needs to to prove that. Potentially trying literally every possible combination of versions. It does not do backtracking to find a "better" solution than the valid solution it currently has.
What are we trying to say with the sentence? And how can we say it more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where your familiarity with the actual algorithm is probably needed to be more precise. The intention is to temper expectations set by an earlier sentence:
When multiple packages specify a dependency for a common package, the resolver attempts to ensure that they use the same version of that common package,
... which would make me expect the example to resolve to a non-duplicated solution. Would it be fair to say that "the resolver does not optimize to reduce crate duplications", "does not have a deduplication pass", or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. In effect were trying to elaborate on the part of the quote you left out:
When multiple packages specify a dependency for a common package, the resolver
attempts to ensure that they use the same version of that common package, as
long as they are within a SemVer compatibility range.
Specifically within open requirement cargo treated as a case of "If multiple packages have a common dependency with semver-incompatible versions..." instead of a case of "When multiple packages specify a dependency for a common package, the resolver attempts to ensure that they use the same version of that common package...", and it does not do something fancy like trying to solve for the second and then falling back to the first if there's an error. 🤔 about how best to say that.
Words are hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rephrase this piece fairly significantly.
src/doc/src/reference/resolver.md
Outdated
and so the resolver uses a heuristic algorithm that may yield counterintuitive | ||
results in some cases. The sections below provide some details on how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be reluctant to start the chapter emphasizing that Cargo's primary job may be "counterintuitive". I personally would prefer to leave out this new sentence. Can you say what the intent of it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that someone new to dependency management might naively think, "OK, Cargo gets all the requirements, finds the versions that work, easy enough" — I don't think it's obvious to a newcomer that dependency-resolution is, indeed, an NP-hard problem where there may be no clear "best" solution to converge to. My intent here was to raise that idea — "this might sound straightforward but actually it isn't always!"
I've updated the PR based on the current feedback. |
Thank you for the clear description of a confusion that is often brought up. I appreciate your follow through on making this documentation better. |
This looks great, thank you! @bors r+ |
☀️ Test successful - checks-actions |
18 commits in 3c5af6bed9a1a243a693e8e22ee2486bd5b82a6c..e84a7928d93a31f284b497c214a2ece69b4d7719 2023-01-24 15:48:15 +0000 to 2023-01-31 22:18:09 +0000 - chore: Add autolabel for `Command-*` labels (rust-lang/cargo#11664) - Update cross test instructions for aarch64-apple-darwin (rust-lang/cargo#11663) - Make cargo install report needed features (rust-lang/cargo#11647) - docs(contrib): Remove out-of-date process step (rust-lang/cargo#11662) - Do not error for `auth-required: true` without `-Z sparse-registry` (rust-lang/cargo#11661) - Warn on commits to non-default branches. (rust-lang/cargo#11655) - Avoid saving the same future_incompat warning multiple times (rust-lang/cargo#11648) - Mention current default value in `publish.timeout` docs (rust-lang/cargo#11652) - Make cargo aware of dwp files. (rust-lang/cargo#11572) - Reduce target info rustc query calls (rust-lang/cargo#11633) - Bump to 0.70.0; update changelog (rust-lang/cargo#11640) - Enable sparse protocol in CI (rust-lang/cargo#11632) - Fix split-debuginfo support detection (rust-lang/cargo#11347) - refactor(toml): Move `TomlWorkspaceDependency` out of `TomlDependency` (rust-lang/cargo#11565) - book: describe how the current resolver sometimes duplicates deps (rust-lang/cargo#11604) - `cargo add` check `[dependencies]` order without considering the dotted item (rust-lang/cargo#11612) - Link CoC to www.rust-lang.org/conduct.html (rust-lang/cargo#11622) - Add more labels to triagebot (rust-lang/cargo#11621)
Update cargo 18 commits in 3c5af6bed9a1a243a693e8e22ee2486bd5b82a6c..e84a7928d93a31f284b497c214a2ece69b4d7719 2023-01-24 15:48:15 +0000 to 2023-01-31 22:18:09 +0000 - chore: Add autolabel for `Command-*` labels (rust-lang/cargo#11664) - Update cross test instructions for aarch64-apple-darwin (rust-lang/cargo#11663) - Make cargo install report needed features (rust-lang/cargo#11647) - docs(contrib): Remove out-of-date process step (rust-lang/cargo#11662) - Do not error for `auth-required: true` without `-Z sparse-registry` (rust-lang/cargo#11661) - Warn on commits to non-default branches. (rust-lang/cargo#11655) - Avoid saving the same future_incompat warning multiple times (rust-lang/cargo#11648) - Mention current default value in `publish.timeout` docs (rust-lang/cargo#11652) - Make cargo aware of dwp files. (rust-lang/cargo#11572) - Reduce target info rustc query calls (rust-lang/cargo#11633) - Bump to 0.70.0; update changelog (rust-lang/cargo#11640) - Enable sparse protocol in CI (rust-lang/cargo#11632) - Fix split-debuginfo support detection (rust-lang/cargo#11347) - refactor(toml): Move `TomlWorkspaceDependency` out of `TomlDependency` (rust-lang/cargo#11565) - book: describe how the current resolver sometimes duplicates deps (rust-lang/cargo#11604) - `cargo add` check `[dependencies]` order without considering the dotted item (rust-lang/cargo#11612) - Link CoC to www.rust-lang.org/conduct.html (rust-lang/cargo#11622) - Add more labels to triagebot (rust-lang/cargo#11621) r? `@ghost`
Update cargo 18 commits in 3c5af6bed9a1a243a693e8e22ee2486bd5b82a6c..e84a7928d93a31f284b497c214a2ece69b4d7719 2023-01-24 15:48:15 +0000 to 2023-01-31 22:18:09 +0000 - chore: Add autolabel for `Command-*` labels (rust-lang/cargo#11664) - Update cross test instructions for aarch64-apple-darwin (rust-lang/cargo#11663) - Make cargo install report needed features (rust-lang/cargo#11647) - docs(contrib): Remove out-of-date process step (rust-lang/cargo#11662) - Do not error for `auth-required: true` without `-Z sparse-registry` (rust-lang/cargo#11661) - Warn on commits to non-default branches. (rust-lang/cargo#11655) - Avoid saving the same future_incompat warning multiple times (rust-lang/cargo#11648) - Mention current default value in `publish.timeout` docs (rust-lang/cargo#11652) - Make cargo aware of dwp files. (rust-lang/cargo#11572) - Reduce target info rustc query calls (rust-lang/cargo#11633) - Bump to 0.70.0; update changelog (rust-lang/cargo#11640) - Enable sparse protocol in CI (rust-lang/cargo#11632) - Fix split-debuginfo support detection (rust-lang/cargo#11347) - refactor(toml): Move `TomlWorkspaceDependency` out of `TomlDependency` (rust-lang/cargo#11565) - book: describe how the current resolver sometimes duplicates deps (rust-lang/cargo#11604) - `cargo add` check `[dependencies]` order without considering the dotted item (rust-lang/cargo#11612) - Link CoC to www.rust-lang.org/conduct.html (rust-lang/cargo#11622) - Add more labels to triagebot (rust-lang/cargo#11621) r? `@ghost`
What does this PR try to resolve?
This updates the book to document the behavior described in #9029, where sometimes Cargo will duplicate a dep when it doesn't have to.
How should we test and review this PR?
Doc-only change; someone with knowledge of the resolver should read and assess.