Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip collecting coverage for CLI tests #8930

Merged
merged 1 commit into from
Nov 11, 2024
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 8, 2024

We're frequently seeing weird deadlocks / stuck tests when running CLI tests. I've tuned the timeouts a couple of times and for some this was indeed helpful.

I've been looking into this again and saw in https://github.com/dask/distributed/actions/runs/11736945020/job/32696964203 an interesting traceback

E           subprocess.TimeoutExpired: Command '['/home/runner/miniconda3/envs/dask-distributed/bin/dask', 'worker', 'tcp://127.0.0.1:34103', '--nworkers=2', '--no-nanny']' timed out after 10 seconds

../../../miniconda3/envs/dask-distributed/lib/python3.10/subprocess.py:1198: TimeoutExpired
----------------------------- Captured stdout call -----------------------------
b'2024-11-08 06:03:36,069 - distributed.dask_worker - ERROR - Failed to launch worker.  You cannot use the --no-nanny argument when n_workers > 1.\n'
------ stdout: returncode -9, ['/home/runner/miniconda3/envs/dask-distributed/bin/dask', 'worker', 'tcp://127.0.0.1:34103', '--nworkers=2', '--no-nanny'] ------
Exception ignored in atexit callback: <function _python_exit at 0x7fb6909fde10>
Traceback (most recent call last):
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/coverage/collector.py", line 252, in lock_data
    self.data_lock.acquire()
KeyboardInterrupt: 

This suggests that coverage has a python atexit hook that is locking up for some reason. Possibly because it cannot write the coverage data out quickly enough.

I hope that a quick fix is to just not collect coverage for the CLI tests

@fjetter fjetter changed the title Skip collecting coverage for CLI calls Skip collecting coverage for CLI tests Nov 8, 2024
@fjetter
Copy link
Member Author

fjetter commented Nov 8, 2024

same thing here https://github.com/dask/distributed/actions/runs/11728636755/job/32672527284

I so hope this is it... those failures have been driving me mad

Copy link
Contributor

github-actions bot commented Nov 8, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    25 files  ±0      25 suites  ±0   10h 17m 11s ⏱️ -38s
 4 129 tests ±0   4 017 ✅ ±0    110 💤 ±0  1 ❌  - 1  1 🔥 +1 
47 681 runs  +1  45 559 ✅ +2  2 120 💤  - 1  1 ❌  - 1  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit d9cbd6d. ± Comparison against base commit 26b1061.

@fjetter
Copy link
Member Author

fjetter commented Nov 11, 2024

Well, None of the CLI tests crashed.

@fjetter fjetter merged commit 6b7f187 into dask:main Nov 11, 2024
30 of 32 checks passed
@fjetter fjetter deleted the skip_cli_coverage branch November 11, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant