Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calloop + async-process leads to hours-long test times #968

Closed
detly opened this issue Mar 8, 2022 · 7 comments
Closed

Calloop + async-process leads to hours-long test times #968

detly opened this issue Mar 8, 2022 · 7 comments
Assignees
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables

Comments

@detly
Copy link

detly commented Mar 8, 2022

I've written a crate calloop-subproc that uses the Calloop event loop and async-process to manage subprocesses in an event loop. It's about 700 lines of code with 9 integration tests. In order to run the tests, the test binary has to build another test binary that can serve as the "thing to run as a subprocess" part of the tests.

Without tarpaulin, the tests take about 5s including the extra binary build. Under tarpaulin, they never finish. They don't seem to hang, however — I can gradually let more and more tests finish by increasing the timeout from 1m to 2m, to 10m, to an hour. So far, however, I have not found a long enough duration for the whole set of (9) tests to finish.

I have tried to create a minimal example, but without much luck. It really seems to be a result of the following factors used together:

  • the Calloop event loop
  • the async-process crate
  • test-time building

...and possibly other factors?

Nonetheless, my tarp-test project sort-of shows the problem, where a usually-20s test runs in 230s under tarpaulin (including the one-off build time). It certainly doesn't show the same magnitude of slowdown that the original project shows though.

I initially thought it was due to signals for subprocess management (kill, sigchld etc.) being blocked, but the tests themselves seem to actually work, so I doubt it's that after all.

I'll keep trying to see if I can identify a particular factor that's having such a disproportionate impact.

@detly
Copy link
Author

detly commented Mar 8, 2022

I've narrowed it down a bit and found that while some tests (in calloop-subproc) complete in 200s, others take more than an hour (I still haven't found how long):

200  test_chain_single_failure
200  test_chain_single_success
?    test_chain_multiple_success_with_failure_and_cleanup
?    test_chain_multiple_success_fail
?    test_chain_multiple_success_with_failure_and_cleanup_failure
?    test_chain_multiple_success_with_cleanup_failure
?    test_chain_multiple_success
202  test_listen_one_line
199  test_listen_three_lines

@xd009642
Copy link
Owner

xd009642 commented Mar 8, 2022

So, here is a chance the current PR I'm working on will greatly improve this #962 if you want to try it out. It changes that part of the code a lot and right now I'm just fighting with one intermittently failing test on nightly. If you're interested in some of the why then this comment explains a lot of the details #953 (comment)

Once I've got this PR done and merged I'll look at this issue and see what the difference is (or maybe before, just once I've got the flaky test sorted). And hopefully come back to this issue with some updates

@detly
Copy link
Author

detly commented Mar 8, 2022

Okay, I managed to substantially reduce the repro, it seems like having multiple async process spawns, even if they're sequential, just totally stops under tarpaulin. I get this output from the test repo linked above:

tarp-test ⚬ cargo tarpaulin --ignore-tests --timeout 120 --forward -- --nocapture
Mar 08 17:35:49.125  INFO cargo_tarpaulin::config: Creating config
Mar 08 17:35:49.151  INFO cargo_tarpaulin: Running Tarpaulin
Mar 08 17:35:49.151  INFO cargo_tarpaulin: Building project
Mar 08 17:35:49.151  INFO cargo_tarpaulin::cargo: Cleaning project
   Compiling [etc etc]
    Finished test [unoptimized + debuginfo] target(s) in 27.73s
Mar 08 17:36:16.969  INFO cargo_tarpaulin::process_handling::linux: Launching test
Mar 08 17:36:16.969  INFO cargo_tarpaulin::process_handling: running /home/jason/Code/tarp-test/target/debug/deps/test_one-036ad9d4be9c1c80

running 1 test
Spawning first process
Awaiting first process
test test_one has been running for over 60 seconds
Mar 08 17:38:21.793 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response
Error: "Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response"

@detly
Copy link
Author

detly commented Mar 8, 2022

Oops, crossed signals there. I'll give that branch a try.

@detly
Copy link
Author

detly commented Mar 8, 2022

Didn't seem to change things I'm afraid. I'm using 9543cb2 and get:

tarp-test ⚬ ../tarpaulin/target/release/cargo-tarpaulin tarpaulin --ignore-tests --forward --timeout 120 -- --nocapture
Mar 08 18:17:08.440  INFO cargo_tarpaulin::config: Creating config
Mar 08 18:17:08.547  INFO cargo_tarpaulin: Running Tarpaulin
Mar 08 18:17:08.547  INFO cargo_tarpaulin: Building project
Mar 08 18:17:08.547  INFO cargo_tarpaulin::cargo: Cleaning project
   Compiling libc v0.2.119
   Compiling futures-core v0.3.21
   Compiling log v0.4.14
   Compiling cfg-if v1.0.0
   Compiling memchr v2.4.1
   Compiling autocfg v1.1.0
   Compiling pin-project-lite v0.2.8
   Compiling futures-task v0.3.21
   Compiling slab v0.4.5
   Compiling signal-hook v0.3.13
   Compiling cache-padded v1.2.0
   Compiling waker-fn v1.1.0
   Compiling futures-util v0.3.21
   Compiling parking v2.0.0
   Compiling fastrand v1.7.0
   Compiling futures-io v0.3.21
   Compiling pin-utils v0.1.0
   Compiling once_cell v1.10.0
   Compiling bitflags v1.3.2
   Compiling event-listener v2.5.2
   Compiling concurrent-queue v1.2.2
   Compiling memoffset v0.6.5
   Compiling futures-lite v1.12.0
   Compiling socket2 v0.4.4
   Compiling signal-hook-registry v1.4.0
   Compiling polling v2.2.0
   Compiling nix v0.22.3
   Compiling async-io v1.6.0
   Compiling async-process v1.3.0
   Compiling calloop v0.9.3
   Compiling tarp-test v0.1.0 (/home/jason/Code/tarp-test)
    Finished test [unoptimized + debuginfo] target(s) in 25.74s
Mar 08 18:17:34.565  INFO cargo_tarpaulin::process_handling::linux: Launching test
Mar 08 18:17:34.565  INFO cargo_tarpaulin::process_handling: running /home/jason/Code/tarp-test/target/debug/deps/test_one-036ad9d4be9c1c80

running 1 test
Spawning first process
Awaiting first process
test test_one has been running for over 60 seconds
Mar 08 18:19:37.765 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response
Error: "Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response"

@detly detly changed the title Calloop + async-process + test time building leads to hours-long test times Calloop + async-process leads to hours-long test times Mar 11, 2022
@xd009642 xd009642 added bug Instrumentation Issues relating to ptrace and parsing of DWARF tables labels Mar 20, 2022
@xd009642
Copy link
Owner

xd009642 commented Feb 7, 2023

So this should now be a lot better with --engine llvm for llvm coverage instrumentation which was released during 2022. If not feel free to reopen and I'll look again

@xd009642 xd009642 closed this as completed Feb 7, 2023
@detly
Copy link
Author

detly commented Feb 8, 2023

Just confirming, all I had to do was add that flag and it works perfectly in my example project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables
Projects
None yet
Development

No branches or pull requests

2 participants