Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run merged doc-tests by default in the same process and add opt-out attribute #129119

Open
Voultapher opened this issue Aug 15, 2024 · 18 comments
Open
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@Voultapher
Copy link
Contributor

Voultapher commented Aug 15, 2024

#126245 merged support in Edition 2024 for consolidating doctests into as few binaries as possible, with an opt-out of a standalone attribute.

I propose by default doc-tests should be run in parallel in the same process.

What's the motivation for the proposed change? For example when working on the Rust standard library, I usually run the tests for core and std/alloc, that takes incremental ~84s on my Zen 3 5900X machine. Out of that ~66s are spent running the ~4k core doc tests. That's on Linux, on Windows I expect this to be even slower, due to a higher process spawning overhead. Merged doc tests is tied to the Rust 2024 edition because it can break behavior. So there is already precedent for having a new default that is better in the common case, but may break existing projects that opt into the new edition, requiring some work. I believe and doing some code analysis via grep.app would support that the majority of doc tests can be run in parallel in the same binary. So that should be the default. For the cases that depend on some kind of singleton, log lib init, etc. they can be marked with an attribute as requiring a standalone process. That would make for two possible attributes: build isolated and run isolated. If this is deemed too complex, it's imaginable that there is only a single attribute that implies both build and run isolated.

Tagging @GuillaumeGomez

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 15, 2024
@Voultapher Voultapher changed the title Run merged doc-tests by default in the same binary and add opt-out attribute Run merged doc-tests by default in the same process and add opt-out attribute Aug 15, 2024
@GuillaumeGomez
Copy link
Member

GuillaumeGomez commented Aug 15, 2024

Merged doc tests is tied to the Rust 2024 edition because it can break behavior.

It's not the only reason. It's because it changes the output and adds a new code example attribute.

I believe and doing some code analysis via grep.app would support that the majority of doc tests can be run in parallel in the same binary.

I have to admit that I don't see how any script would make it possible to detect such cases. If it's code from a dependency, there is no way to detect it (and let's not even mention FFI).

For the cases that depend on some kind of singleton, log lib init, etc. they can be marked with an attribute as requiring a standalone process.

Again, I don't see how any script could detect that.

I think the only reasonable way to do it is to add an option or a crate doc attribute to say "run all tests in the same thread" which would be disabled by default.

And finally: did you compare the before before/after using this new feature? Are the numbers you provided from the current doctests or with the feature enabled?

@jieyouxu jieyouxu added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 15, 2024
@Voultapher
Copy link
Contributor Author

Again, I don't see how any script could detect that.

No script, I just looked at a hundred or some examples of doc-tests with my eyes. My argument is that it seems to be the more common case, so we should optimize for that and make it the default.

I think the only reasonable way to do it is to add an option or a crate doc attribute to say "run all tests in the same thread" which would be disabled by default.

Maybe I'm missing something, but why can't this be a per doc-test attribute?

And finally: did you compare the before before/after using this new feature? Are the numbers you provided from the current doctests or with the feature enabled?

I tried it out yesterday on master after the feature had been merged and saw no difference for my use-case, but that might be because the bootstrap wasn't updated yet, maybe?

@GuillaumeGomez
Copy link
Member

No script, I just looked at a hundred or some examples of doc-tests with my eyes. My argument is that it seems to be the more common case, so we should optimize for that and make it the default.

That doesn't sound like a good metric. ^^'

Maybe I'm missing something, but why can't this be a per doc-test attribute?

You would prefer to mark all code examples manually instead?

I tried it out yesterday on master after the feature had been merged and saw no difference for my use-case, but that might be because the bootstrap wasn't updated yet, maybe?

rustc isn't compiled using the 2024 edition so the feature is not enabled there yet.

But again: before trying to build even more options on top of this feature, maybe it'd be better first to check the impact of performance if all doctests were running the same thread.

@Voultapher
Copy link
Contributor Author

Voultapher commented Aug 15, 2024

That doesn't sound like a good metric. ^^'

It's not great, but better than nothing. We could enable it and do some crater run. I think it's worth a shot.

You would prefer to mark all code examples manually instead?

Not all, only those that need it. Rust is a lot about explicitness, and while for some projects every doctest will need it, I think that's the exception, and for 80+% of cases no attributes are needed at all.

But again: before trying to build even more options on top of this feature, maybe it'd be better first to check the impact of performance if all doctests were running the same thread.

I'm suggesting, multiple threads inside one process. I'm all for giving it a try, how much effort do you guess would such an experiment be for you? I can look into trying it myself, but haven't touched the doctest code at all until now. Starting a process is 20+ million cycles of work usually before main gets even executed, which adds up.

@Voultapher
Copy link
Contributor Author

Voultapher commented Aug 16, 2024

I ran an approximation of a best-case, where all tests can be run in the same
process and the tests themselves are relatively lightweight. The same code with
minor variations is added as 1000 integration tests and as 1000 doctests. The
code in question is:

/// This is a demo function, that contains a doc example.
///
/// # Example
///
/// ```
/// let mut vec_example_fn_0 = vec![1, 2, 85590];
/// vec_example_fn_0.push(3);
/// assert_eq!(vec_example_fn_0, [1, 2, 85590, 3]);
/// ```
pub fn example_fn_0() {}

#[test]
fn example_fn_0() {
    let mut vec_example_fn_0 = vec![1, 2, 16996];
    vec_example_fn_0.push(3);
    assert_eq!(vec_example_fn_0, [1, 2, 16996, 3]);
}

Results Linux

Setup

Linux 6.10
rustc 1.82.0-nightly (2c93fabd9 2024-08-15)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.

edition = "2021"

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):     10.545 s ±  0.022 s    [User: 104.292 s, System: 95.885 s]
  Range (min … max):   10.522 s … 10.565 s    3 runs

Breakdown: Clean compile ~1.4s, integration tests ~0.02s, doctests ~16.8s [1]

edition = "2024"

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):      2.846 s ±  0.015 s    [User: 3.548 s, System: 0.743 s]
  Range (min … max):    2.829 s …  2.859 s    3 runs

Breakdown: Clean compile ~1.4s, integration tests ~0.02s, doctests ~0.1s [1]

edition = "2024" and doctest = false

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):      1.470 s ±  0.001 s    [User: 1.552 s, System: 0.219 s]
  Range (min … max):    1.469 s …  1.471 s    3 runs

Breakdown: Clean compile ~1.4s, integration tests ~0.02s [1]

Results Windows

Setup

Windows 10
rustc 1.82.0-nightly (2c93fabd9 2024-08-15)
AMD Ryzen 9 5900X 12-Core Processor (Zen 3 micro-architecture)
CPU boost enabled.

edition = "2021"

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):     55.053 s ±  1.733 s    [User: 103.338 s, System: 108.279 s]
  Range (min … max):   53.398 s … 56.855 s    3 runs

Breakdown: Clean compile ~2.4s, integration tests ~0.08s, doctests ~51.3s [1]

edition = "2024"

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):      7.107 s ±  0.015 s    [User: 7.170 s, System: 8.052 s]
  Range (min … max):    7.097 s …  7.124 s    3 runs

Breakdown: Clean compile ~2.4s, integration tests ~0.08s, doctests ~2.3s [1]

edition = "2024" and doctest = false

$ hyperfine --min-runs 3 --prepare 'cargo clean' 'cargo t'
  Time (mean ± σ):      2.575 s ±  0.066 s    [User: 2.920 s, System: 0.677 s]
  Range (min … max):    2.518 s …  2.648 s    3 runs

Breakdown: Clean compile ~2.4s, integration tests ~0.08s [1]

[1] Numbers as reported by cargo, in reality felt latency can be quite
different, and very small numbers are inaccurate. E.g. the integration test
binary alone takes ~60ms when measured with perf stat which accounts for
process startup, which is 3x what cargo reported. Doctests claim to to run in
100ms but that does not account for some other step, maybe compiling them,
beforehand. Without any code changes perf stat -d cargo t --test it is ~90ms.
In contrast running perf stat -d cargo t --doc the wall time rises to ~2750ms,
for the exact same test coverage. I'm not sure how to further drill into how
much of that is process startup and how much is compiling the doctests.

Key takeaways

  • Greatly speed up doctests by compiling compatible doctests in one file #126245 is a huge improvement!
  • The numbers reported by cargo and wall-time waited by users can show large
    discrepancies.
  • Given a sufficient quantity of doctests, iterative cargo t will become slow,
    no matter how little content is tested.
  • Windows is maybe not the best OS for software development.
  • Comparing cargo t --test it vs cargo t --doc, and assuming the doctest
    build doesn't get cached, and that compiling the doctests takes a similar
    amount of time as the integration test binary, we get a "run as individual
    processes" overhead of ~1s on Linux and ~3s on Windows for 1k doc-tests.

Repro

All code can be found here: https://github.com/Voultapher/doctest-parallel-experiment

@Kobzol
Copy link
Contributor

Kobzol commented Aug 17, 2024

That overhead seems to be quite low (surprisingly even on Windows). That being said, it would be nice if we could default into running doctests in the same process, especially since it's already the default mode of operation for normal unit tests. Unfortunately, that ship has kind of sailed because doctests were run as separate processes since Rust 1.0.

In theory, we could switch the default in Edition 2024, but it might be a too disruptive of a change; it would require people to manually mark problematic tests with an attribute, to opt-out of thread execution. I don't think that we would need a separate attribute though, we could reuse the standalone attribute for this, because it has similar semantics; when a doctest is compiled as a separate binary, it cannot be executed in the same process as the rest of the tests anyway, so standalone means separate binary + separate process.

Since some people actually have problems with running unit tests in threads, maybe a more general solution could be to add some option like --use-processes/threads, which would allow people to explicitly state how tests should be executed. The existing standalone attribute could then be used as an opt-out for thread execution. I even think that libtest already supports thread/process execution currently, but it's not exposed outside (but maybe I'm confusing this with compiletest).

@Voultapher
Copy link
Contributor Author

Good point, standalone already implies run in a separate process.

Regarding the breaking change, isn't this feature already a breaking change? If so, I think it's reasonably easy to explain that from edition 2024 onwards doctests will be compiled and run the same way as unit and integration tests. My understanding is that the edition mechanism exists exactly to avoid the C++ situation of "yeah we'd like to change that but we can't".

Regarding a --use-processes option, cargo nextest fills that niche quite nicely, maybe we could do a better job of surfacing that information to users? In my experience usually code that requires process isolated testing, shows little concern for re-entrancy and parallel usage, often by ignorance and not desire. I'd like to avoid promoting such use-cases by default for unit and integration tests. Reading the linked issue, the desire comes from having to interface with software that was written without such considerations. Rust by default is thread-safe and that by necessity requires thoughtful usage of shared global state.

@Voultapher
Copy link
Contributor Author

Can we get this discussed by the appropriate team, I think edition 2024 would be the perfect moment to make this change, given we are already breaking doctest compatibility.

@Kobzol
Copy link
Contributor

Kobzol commented Aug 24, 2024

Seems like it might be a bit late in the edition cycle, especially since there's no implementation ready atm. But @traviscross would know the procedure.

FWIW, I'm not sure if this is worth it, especially since the perf. benefits seem to be relatively small. I consider it to be a big success that Guillaume managed to speed up doctests so much while keeping the original process semantics. If we also changed the execution behavior, it would make it harder for some projects to make use of the merged compilation mode (and more importantly, harder to switch to the 2024 edition).

@GuillaumeGomez
Copy link
Member

Just like @Kobzol said. I think it could eventually be an option to be enabled, but I'm not sure it's worth the code needed for it. But maybe I'm wrong, a comparison would be needed for that.

@Voultapher
Copy link
Contributor Author

a comparison would be needed for that.

What do you mean by that?

@Voultapher
Copy link
Contributor Author

Voultapher commented Aug 24, 2024

benefits seem to be relatively small.

Well I guess that comes down to personal preference, but for me even having to wait 500ms more during iterative development feels annoying. Usually I just turn off doc-tests because they also make the console output more noisy.

@GuillaumeGomez
Copy link
Member

To measure the impact of using processes vs threads.

@Voultapher
Copy link
Contributor Author

Maybe you missed my comment here #129119 (comment)

@GuillaumeGomez
Copy link
Member

I missed it indeed.

@Kobzol
Copy link
Contributor

Kobzol commented Aug 24, 2024

benefits seem to be relatively small.

Well I guess that comes down to personal preference, but for me even having to wait 500ms more during iterative development feels annoying. Usually I just turn off doc-tests because they also make the console output more noisy.

This is a bit orthogonal, but it's in fact wasteful to compile (and run) all doctests after most recompiles. One alternative to make doctests faster could be to make them incremental and only rebuild/rerun code that has changed.

@GuillaumeGomez
Copy link
Member

It would require to save doctests files but I suppose nothing prevents it.

@Voultapher
Copy link
Contributor Author

Voultapher commented Aug 24, 2024

How about this path forward. As two separate tasks we tackle:

  • Make doctest build incremental
  • Add a opt-in cargo flag that allows projects to run merged doctests in the same process

This way we are not tied to any edition timelines and we can improve the experience users have in an incremental fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants