Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo clean -p package can clear the document of a single crates #9841

Closed
wants to merge 12 commits into from
7 changes: 7 additions & 0 deletions src/cargo/core/compiler/layout.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ pub struct Layout {
examples: PathBuf,
/// The directory for rustdoc output: `$root/doc`
doc: PathBuf,
/// The directory for rustdoc output: `$root/doc/src`
doc_src: PathBuf,
/// The directory for temporary data of integration tests and benches: `$dest/tmp`
tmp: PathBuf,
/// The lockfile for a build (`.cargo-lock`). Will be unlocked when this
Expand Down Expand Up @@ -172,6 +174,7 @@ impl Layout {
fingerprint: dest.join(".fingerprint"),
examples: dest.join("examples"),
doc: root.join("doc"),
doc_src: root.join("doc/src"),
tmp: root.join("tmp"),
root,
dest,
Expand Down Expand Up @@ -206,6 +209,10 @@ impl Layout {
pub fn doc(&self) -> &Path {
&self.doc
}
/// Fetch the doc/src path.
pub fn doc_src(&self) -> &Path {
&self.doc_src
}
/// Fetch the root path (`/…/target`).
pub fn root(&self) -> &Path {
&self.root
Expand Down
6 changes: 5 additions & 1 deletion src/cargo/ops/cargo_clean.rs
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,13 @@ pub fn clean(ws: &Workspace<'_>, opts: &CleanOptions<'_>) -> CargoResult<()> {
for pkg in packages {
let pkg_dir = format!("{}-*", pkg.name());

// Clean fingerprints.
for (_, layout) in &layouts_with_host {
// Clean fingerprints.
rm_rf_glob(&layout.fingerprint().join(&pkg_dir), config)?;
// Clean target/doc/pkg.
rm_rf(&layout.doc().join(pkg.name()), config)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't using the correct name here. The directory names are the crate names of the targets. I'm uncertain about the best way to handle this, but specifically:

  • Non-workspace members can only have library targets documented, so that would be something like pkg.library().map(|t| t.crate_name()). I think it could be dangerous to blindly remove directories for other targets like "examples" because there could be name collisions (for example, if something has an example called "serde", it would delete the serde directory, which is probably not what you want).
  • For workspace members, it is possible to document binaries (and now examples due to Adding the cargo doc --examples subcommand #9808). So to clean a package, it would need to delete all documentable targets. Unfortunately this runs afoul with the "serde" example, too, so maybe just deleting libraries is sufficient? I'm still not entirely clear on the motivation for cleaning these, so I'm uncertain if it is important or not.

Copy link
Contributor Author

@heisen-li heisen-li Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be dangerous to blindly remove directories for other targets like "examples" because there could be name collisions (for example, if something has an example called "serde", it would delete the serde directory, which is probably not what you want).

I'm sorry, I don't quite understand your question. Could you explain it again?

For workspace members, it is possible to document binaries (and now examples due to Adding the cargo doc --examples subcommand #9808). So to clean a package, it would need to delete all documentable targets. Unfortunately this runs afoul with the "serde" example, too, so maybe just deleting libraries is sufficient? I'm still not entirely clear on the motivation for cleaning these, so I'm uncertain if it is important or not.

I'm sorry. I really didn't think of the question in advance. If the example name in the ../examples/ folder is the same as another crate name, this should be taken into account.

Also, I'm not sure I understood this correctly: Is the current question the same as you stated above (about the example name conflicts with the crate name)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, both points are related to one another.

The doc directories are based on a target's crate name. So, for example, if I have a package named "foo" with a "foo" library and a "foo-cli" binary, there would be two directories:

  • doc/foo
  • doc/foo_cli

To clean this properly, it would need to iterate over all documentable targets and get their crate names. Don't forget that the package name, the target name, and the crate name can all be different. That is, package "abc-rs" can have a library named "my-abc-lib" which has a crate name of "my_abc_lib".

However, I'd be concerned about indiscriminately iterating over a package's targets because of name collisions. Imagine my "foo" library had a dependency on "serde", and a dependency on another package named "bar". I run cargo doc and also get the documentation for doc/foo, doc/serde, and doc/bar. Now, imagine bar had an binary coincidentally named serde, if a user runs cargo clean -p bar, we don't want to blow away the serde directory just because bar had a serde binary.

So, I'm thinking, to avoid deleting the wrong things, one step would be to only look for library targets of non-workspace dependencies. That should limit the collateral damage.

Then, I'm wondering about limiting the collateral damage of workspace members, too. Perhaps including examples is overkill, and perhaps it should only delete library crate names? Or maybe libraries + binaries? Or maybe it is fine to collect all targets with documented() is true. That's where I'm uncertain, since I'm not really sure I understand the need for this functionality.

I think I would lean towards just deleting the entries from pkg.targets().iter().filter(|t| t.documented()) for workspace members, and just the lib for non-members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some time to think, please give me some time. i am sorry, huss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, feel free to take your time, there's no rush. And let me know if you have any questions or if anything wasn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just ended my wedding leave. It was too late to reply.

However, I'd be concerned about indiscriminately iterating over a package's targets because of name collisions. Imagine my "foo" library had a dependency on "serde", and a dependency on another package named "bar". I run cargo doc and also get the documentation for doc/foo, doc/serde, and doc/bar. Now, imagine bar had an binary coincidentally named serde, if a user runs cargo clean -p bar, we don't want to blow away the serde directory just because bar had a serde binary.

This scenario was tested based on my understanding.

PS D:\Rust\temp\foo> cat .\Cargo.toml
...
[dependencies]
serde = "1.0.130"
bar = { path = "./bar"}
PS D:\Rust\temp\foo\bar> cat .\Cargo.toml
...
[[bin]]
name = "serde"
path = "./serde/src/main.rs"

The directory after cargo doc looks like this:

doc/bar
doc/foo
doc/serde

The binary files of serde and bar seem to have nothing to do with each other. In other words, the binary files under the bar directory will not form a document.

Therefore, running cargo clean -p bar only needs to delete doc/bar, and does not need to consider the conflict between the serde binary file under bar and the dependent serde of foo.
Is my understanding correct?

That's where I'm uncertain, since I'm not really sure I understand the need for this functionality.

My understanding of this function is:
If you execute cargo clean -p foo, you need to delete all doc/foo,doc/example_name,doc/dependence_name.
And these documents are in pkg.targets().iter().filter(|t| t.documented()).

I think I would lean towards just deleting the entries from pkg.targets().iter().filter(|t| t.documented()) for workspace members, and just the lib for non-members.

What does workspace member mean here?
Is this in Cargo.toml?

PS D:\Rust\temp\foo> cat .\Cargo.toml
...
[workspace]
members = ["member_1", "member_2", ...]

Frankly, I don't seem to fully understand workspace members and lib for non-members.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does workspace member mean here?

Essentially, yea, it is the members list. It's a little more complicated, since additional members can be inferred.

My understanding of this function is:
If you execute cargo clean -p foo, you need to delete all doc/foo,doc/example_name,doc/dependence_name.
And these documents are in pkg.targets().iter().filter(|t| t.documented()).

I think my question was: why does someone want to delete directories from the doc directory? What is the drawback of leaving them there? What is the use case for needing to delete them? Is it OK for it to only partially delete a package's data?

The binary files of serde and bar seem to have nothing to do with each other. In other words, the binary files under the bar directory will not form a document.

Therefore, running cargo clean -p bar only needs to delete doc/bar, and does not need to consider the conflict between the serde binary file under bar and the dependent serde of foo. Is my understanding correct?

I think what I'm trying to get at is that when you document a package, it documents the libraries and its binaries. If you have a package mypackage with binaries bar and baz, there will be three directories in the doc directory. To properly clean mypackage, it will need to iterate over all documentable targets and delete those directories. My concern is that could be too aggressive in some cases, as it could delete too much.

When cargo doc is run, it will document all documentable targets for workspace members, and all libraries for non-workspace members.

So my thought is that it would look for pkg.targets().iter().filter(|t| t.documented()) for workspace members, and pkg.targets().iter().find(|t| t.is_lib()) for non-members maybe?

// Clean target/doc/src/pkg.
rm_rf(&layout.doc_src().join(pkg.name()), config)?;
}

for target in pkg.targets() {
Expand Down
41 changes: 41 additions & 0 deletions tests/testsuite/clean.rs
Original file line number Diff line number Diff line change
Expand Up @@ -549,3 +549,44 @@ fn clean_spec_reserved() {
)
.run();
}

#[cargo_test]
fn clean_spec_doc() {
heisen-li marked this conversation as resolved.
Show resolved Hide resolved
// `clean -p package` make target/doc/package clear
let p = project()
.file(
"Cargo.toml",
r#"
[package]
name = "foo"
version = "0.0.1"
authors = []

[dependencies.bar]
path = "bar"
"#,
)
.file("src/lib.rs", "extern crate bar; pub fn foo() {}")
.file("bar/Cargo.toml", &basic_manifest("bar", "0.0.1"))
.file("bar/src/lib.rs", "pub fn bar() {}")
.build();

p.cargo("doc").run();

let doc_path = &p.build_dir().join("doc");
let foo_doc_path = &doc_path.join("foo");
let bar_doc_path = &doc_path.join("bar");
assert!(doc_path.is_dir());
assert!(foo_doc_path.is_dir());
assert!(bar_doc_path.is_dir());

p.cargo("clean -p foo").run();

assert!(!foo_doc_path.is_dir());
assert!(bar_doc_path.is_dir());

assert!(!doc_path.join("src").join("foo").is_dir());
assert!(doc_path.join("src").join("bar").is_dir());

assert!(p.build_dir().is_dir());
}