Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greatly speed up doctests by compiling compatible doctests in one file #126245

Merged
merged 48 commits into from
Aug 14, 2024

Conversation

GuillaumeGomez
Copy link
Member

@GuillaumeGomez GuillaumeGomez commented Jun 10, 2024

Fixes #75341.

Take 2 at #123974. It should now be much simpler to review since it's based on #125798.

As discussed, this feature will only be enabled starting with the 2024 edition.

I split the changes as much as possible to make it as easy as possible to review (even though it's a huge lot of code changes...).

The following tests are not included into the combined doctests:

  • compile_fail
  • If there are crate attributes (deny/allow/warn are ok)
  • have invalid AST
  • test_harness
  • no capture
  • --show-output (because the output is nicer without the extra code surrounding it)

Everything else is merged into one file. If this one file fails to compile, we go back to the current strategy: compile each doctest separately.

Because of the edition codeblock attribute, I couldn't make them all into one file directly so I grouped them by edition, but it should be pretty anecdotic.

In case the users want a specific doctest to be opted-out from this doctest merge, I added the standalone codeblock attribute:

/// ```rust,standalone
/// // This doctest will not be part of merged doctests!
/// ```

Now the interesting part, I ran it on a few crates and here are the results (with cargo test --doc to only include doctests):

crate nb doctests before this PR with this PR
sysinfo 227 4.6s 1.11s
geos 157 3.95s 0.45s
core 4604 54.08s 13.5s (merged: 0.9s, standalone: 12.6s)
std 1147 12s 3.56s (merged: 2.1s, standalone: 1.46s)

r? @camelid

try-job: x86_64-msvc
try-job: aarch64-apple

@rustbot rustbot added A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jun 10, 2024
@rustbot

This comment was marked as resolved.

@camelid camelid added A-doctests Area: Documentation tests, run by rustdoc A-edition-2024 Area: The 2024 edition needs-fcp This change is insta-stable, so needs a completed FCP to proceed. labels Jun 11, 2024
Copy link
Member

@camelid camelid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review! Wondering if it'd make sense to run this on crater (with the edition-handling temporarily removed so it'd run unconditionally) to check breakage?

src/librustdoc/doctest.rs Outdated Show resolved Hide resolved

run_tests(
test_args,
nocapture,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Isn't nocapture now passed as part of opts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually part of RustdocOptions, good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for test_args, I love when code gets simpler like that. :3

src/librustdoc/doctest/make.rs Outdated Show resolved Hide resolved
src/librustdoc/doctest.rs Show resolved Hide resolved
src/librustdoc/doctest.rs Outdated Show resolved Hide resolved
src/librustdoc/doctest.rs Outdated Show resolved Hide resolved
src/librustdoc/doctest.rs Outdated Show resolved Hide resolved
tests/rustdoc-ui/doctest/wrong-ast-2024.rs Outdated Show resolved Hide resolved
tests/rustdoc-ui/doctest/wrong-ast.rs Outdated Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 11, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Jun 11, 2024

The comment in documentation claims that the slowest part that limits doctests is the compilation time (not the runtime to actually run all the tests). Instead of merging all the doctests into a single program, so that it runs in the same process (which can cause issues mentioned in the previous PR), could we instead perform the compilation in a smarter way, to generate N binaries, but still invoking the compiler just once, to amortize the compilation costs (something like my rustc daemon experiments)? Do you have any expectation of how much time would be saved if compilation was approx. as fast as compiling a single program, but we still invoked N binaries to run the tests?

Or, as a different alternative, compile everything into a single program, but then execute it in N processes (each process would run just a single test), to avoid the issues?

@camelid
Copy link
Member

camelid commented Jun 11, 2024

Interesting. I was actually wondering if there was some way to daemonize rustc. However IIRC one of the slowest parts is linking, which would still be an issue with the approach you suggest :(

@Kobzol
Copy link
Contributor

Kobzol commented Jun 11, 2024

It would also be very non-trivial, implementation wise, I imagine. But what about the second approach? We could compile a single binary that looks something like this:

fn test_0() {}
fn test_1() {}

fn main() {
   let arg = args()[0];
   if arg == "test_0" { test_0(); }
   if arg == "test_1" { test_1(); }
}

And the rustdoc driver would then execute this binary N times (in parallel for all threads on the given machine), and run each test in a separate process.

Copy link
Member

@camelid camelid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to handle the fact that different doctests may use different #![feature)]s.

@camelid
Copy link
Member

camelid commented Jun 11, 2024

This doesn't seem to handle the fact that different doctests may use different #![feature)]s.

I think this would unfortunately break the second approach too @Kobzol

@Kobzol
Copy link
Contributor

Kobzol commented Jun 11, 2024

I think this would unfortunately break the second approach too @Kobzol

Sure, there are possibly additional issues with the "add everything to a single binary" approach, but as far as we can get that working somehow, it seems to me that moving from "run all tests within a single process" to "run each test in a separate process" should be relatively straightforward, and should resolve the issues with shared state between tests.

@camelid
Copy link
Member

camelid commented Jun 11, 2024

Agreed. So sounds like the idea would be:

  • merge doc tests into groups based on each one's combination of edition, no_std/no_core, enabled feature gates, etc.
  • create unified test for each merged group where only one test is run per execution, dependent on the args provided by rustdoc to the binary
  • run each instance (i.e. provided args) of each merged test in parallel

This should obviate the need for doing this over an edition boundary since it shouldn't break any code. It would also automatically work with code that configures global state, without needing to know a standalone attribute. We'd also skip introducing this attribute to rustdoc since it wouldn't be necessary.

@Scripter17
Copy link
Contributor

Scripter17 commented Jun 11, 2024

However IIRC one of the slowest parts is linking

Would nightly's new linker (LLD, I think) be a relevant thing to look through for insight?

@camelid
Copy link
Member

camelid commented Jun 12, 2024

I'm not sure. It'd be worth testing. We should also look into turning off debuginfo for doctests since it's unused.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 12, 2024

Would nightly's new linker (LLD, I think) be a relevant thing to look through for insight?

LLD should help slightly, but it's not a panacea.

I tried LLD for doctesting core, and BFD (the default Linux linker) ran for ~42s, LLD ran for 40s. It seemed that most (maybe 90%) of the time was spent running the tests though, not compiling them (assuming that compilation and running isn't overlapped).

That would also sadly suggest that the overhead from spawning the test processes (even on Linux) is the dominant factor here, which means that my proposed approach wouldn't help that much.

@GuillaumeGomez
Copy link
Member Author

Agreed. So sounds like the idea would be:

* merge doc tests into groups based on each one's combination of edition, no_std/no_core, enabled feature gates, etc.

* create unified test for each merged group where only one test is run per execution, dependent on the args provided by rustdoc to the binary

* run each instance (i.e. provided args) of each merged test in parallel

This should obviate the need for doing this over an edition boundary since it shouldn't break any code. It would also automatically work with code that configures global state, without needing to know a standalone attribute. We'd also skip introducing this attribute to rustdoc since it wouldn't be necessary.

I agree but with some small changes: no_std tests should be on their own, however doctests with features should likely not be part of the merged doctests (we can always change that later on) and therefore no_core as well since it requires a feature to be enabled.

However I"m curious about:

* create unified test for each merged group where only one test is run per execution, dependent on the args provided by rustdoc to the binary

* run each instance (i.e. provided args) of each merged test in parallel

I generate merged doctests like rustc does with #[test], so I'm not sure it'd be worth to spawn a process for each test.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 12, 2024

I generate merged doctests like rustc does with #[test], so I'm not sure it'd be worth to spawn a process for each test.

This approach would resolve the problems with sharing state between tests. But it's quite possible that it would also make the perf. gains much smaller.

@GuillaumeGomez
Copy link
Member Author

Forking on linux has a huge overhead, so the test code itself would likely take less time than spawning it. I'm really not convinced by this approach. Let me check locally the diff.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 12, 2024

If you have a recent-ish glibc and kernel, forking is relatively fast (you could spawn thousands of doctests per second, if needed). Actually I'm pretty sure that forking on Linux would be much faster than on macOS and Windows, so I would worry about these more 😆

But based on some experiments that I have posted above, it seems to me that for crates that have a lot of doc-tests (e.g. core), the primary bottleneck is not compilation, but rather execution of the doctests. If that is indeed the case, then the process spawning approach will still probably be much slower than what you had here (>2x speedup on core) :(

But maybe it's only because the rustc test machinery has a lot of costs when starting. I wonder what would happen if we just literally did this, or, as an alternative, set up everything with libtest, and then fork the process just before executing each test.

@GuillaumeGomez
Copy link
Member Author

True, forking on linux should be pretty fast nowadays. ^^'

But maybe it's only because the rustc test machinery has a lot of costs when starting. I wonder what would happen if we just literally did this, or, as an alternative, set up everything with libtest, and then fork the process just before executing each test.

Hum... I'm definitely not convinced. Especially since you mentioned Windows and MacOS being slow to fork. I can provide a big source code for what rustdoc generates for core later once I'm done with what I'm doing and you can experiment with forking?

@bors
Copy link
Contributor

bors commented Aug 14, 2024

☀️ Test successful - checks-actions
Approved by: t-rustdoc
Pushing e5b3e68 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 14, 2024
@bors bors merged commit e5b3e68 into rust-lang:master Aug 14, 2024
7 checks passed
@rustbot rustbot added this to the 1.82.0 milestone Aug 14, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e5b3e68): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.1% [-1.1%, -1.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.1% [-1.1%, -1.1%] 1

Cycles

Results (secondary 0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [2.8%, 2.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.5% [-2.5%, -2.5%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 752.976s -> 752.981s (0.00%)
Artifact size: 341.40 MiB -> 341.50 MiB (0.03%)

@jyn514
Copy link
Member

jyn514 commented Aug 14, 2024

this is really cool. nice work all.

@Kobzol
Copy link
Contributor

Kobzol commented Aug 14, 2024

Awesome work! 🚀 This should probably receive relnotes or maybe relnotes-perf (not sure if that one is used a lot).

@GuillaumeGomez GuillaumeGomez deleted the doctest-improvement-v2 branch August 14, 2024 08:21
@jieyouxu jieyouxu added relnotes Marks issues that should be documented in the release notes of the next release. relnotes-perf Performance improvements that should be mentioned in the release notes. labels Aug 14, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Aug 15, 2024
@BurntSushi
Copy link
Member

I got an absolutely wild speed-up in Jiff. Before:

$ time cargo t --doc
[.. snip ..]
test result: ok. 903 passed; 0 failed; 2 ignored; 0 measured; 0 filtered out; finished in 12.56s


real    12.783
user    3:14.02
sys     1:25.99
maxmem  162 MB
faults  1425

After:

$ time cargo t --doc
[.. snip ..]
test result: ok. 903 passed; 0 failed; 2 ignored; 0 measured; 0 filtered out; finished in 0.25s


real    2.524
user    4.943
sys     2.355
maxmem  458 MB
faults  23

@GuillaumeGomez
Copy link
Member Author

GuillaumeGomez commented Aug 15, 2024

Going down from 4min39 to 7 secs is quite wild indeed. We did a good job here. :)

@kpreid
Copy link
Contributor

kpreid commented Aug 15, 2024

Note that this feature only takes effect if you set package.edition = "2024" (documented here). I mention this for the sake of the next person who, like me, wants to try it out and see the improvement, but didn’t recall that part of the design discussion. Just running cargo +nightly test --doc doesn’t suffice.

@ehuss ehuss added relnotes Marks issues that should be documented in the release notes of the next release. and removed relnotes Marks issues that should be documented in the release notes of the next release. labels Aug 20, 2024
@ehuss
Copy link
Contributor

ehuss commented Aug 20, 2024

Awesome work! 🚀 This should probably receive relnotes or maybe relnotes-perf (not sure if that one is used a lot).

Can you say more about why this should be relnotes? It's currently unstable (under the edition).

@Kobzol
Copy link
Contributor

Kobzol commented Aug 20, 2024

...because I forgot about that :) Sorry. It should definitely be mentioned in the 2024 Edition relnotes, as I imagine it will be :)

@ehuss ehuss removed relnotes Marks issues that should be documented in the release notes of the next release. relnotes-perf Performance improvements that should be mentioned in the release notes. labels Aug 20, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-doctests Area: Documentation tests, run by rustdoc A-edition-2024 Area: The 2024 edition A-run-make Area: port run-make Makefiles to rmake.rs disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Rustdoc should run all doctests in one binary