Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeouts for tests #7657

Merged
merged 9 commits into from
Mar 24, 2023
Merged

add timeouts for tests #7657

merged 9 commits into from
Mar 24, 2023

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Mar 22, 2023

The macos 3.11 CI seems stall very often at the moment, which makes it hit the 6 hours mark and get cancelled. Since our tests should never run that long (the ubuntu runners usually take between 15-20 minutes), I'm introducing a pretty generous timeout of 5 minutes. By comparison, we already have a timeout of 60 seconds in the upstream-dev CI, but that's on a ubuntu runner which usually is much faster than any of the macos / windows runners.

Tests that time out raise an error, which might help us with figuring out which test it is that stalls, and also if we can do anything about that.

keewis added 3 commits March 22, 2023 11:08
This is plenty for a test to run: we shouldn't have tests that take
more than 30 seconds, but the windows / macos runners tend to be
slower than the linux ones.
@github-actions github-actions bot added Automation Github bots, testing workflows, release automation CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Mar 22, 2023
@keewis
Copy link
Collaborator Author

keewis commented Mar 22, 2023

apparently python 3.11 on windows is also pretty slow, which makes the timeouts appear there as well. But in any case, here's the affected tests:

FAILED xarray/tests/test_backends.py::TestDask::test_dask_roundtrip - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestDask::test_deterministic_names - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestDask::test_dataarray_compute - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestDask::test_load_dataset - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestDask::test_load_dataarray - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestDask::test_inline_array - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestPydap::test_cmp_local_file - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestPydap::test_compatible_to_netcdf - Failed: Timeout >90.0s
FAILED xarray/tests/test_backends.py::TestPydap::test_dask - Failed: Timeout >90.0s

which are the exact same tests that also fail for >300s, so I'm guessing those are the ones that stall.

Does anyone know why those take so long only on macos (and, partially, windows)? Does that have to do with the runner hardware?

@headtr1ck
Copy link
Collaborator

Does anyone know why those take so long only on macos (and, partially, windows)? Does that have to do with the runner hardware?

Some of there files are 10 numbers, so the netcdf should be ~50kb or so (mostly the header overhead).
On no hardware this should take >1min to read and write?

@keewis
Copy link
Collaborator Author

keewis commented Mar 23, 2023

I guess that means that the CPUs of the windows and macos runners are just slow, or there's other tasks that get prioritized, or something. All of this results in the tests being pretty flaky, so I'm not sure what we can do about it. I don't have access to any, but maybe someone with a mac could try to reproduce?

In any case, I'll increase the timeout a bit again since I think a timeout after ~1hour is better than the CI job being cancelled after 6 hours.

@keewis
Copy link
Collaborator Author

keewis commented Mar 23, 2023

since the macos 3.11 CI is not required, I'm tempted to merge this now, and continue to debug elsewhere.

@keewis keewis added the plan to merge Final call for comments label Mar 23, 2023
@keewis keewis enabled auto-merge (squash) March 24, 2023 12:48
@dcherian
Copy link
Contributor

Is it for a particular backend?

@keewis keewis merged commit 635b9d0 into pydata:main Mar 24, 2023
@keewis keewis deleted the test-timeouts branch March 24, 2023 15:49
@keewis
Copy link
Collaborator Author

keewis commented Mar 24, 2023

Is it for a particular backend?

Not sure I understand. Can you elaborate, please?

@dcherian
Copy link
Contributor

FAILED xarray/tests/test_backends.py::TestDask

I'm thinking this failures is some bad interaction between an I/O backend and dask threads. But I'm having trouble figuring out which backend.

@keewis
Copy link
Collaborator Author

keewis commented Mar 24, 2023

depends on the test, I guess. Most of them are related to one of the netcdf backends (not sure which, the tests don't specify that), test_dataarray_compute just checks _in_memory (no I/O involved), and the pydap tests use netcdf underneath. So I'd say the issue is with one of the netcdf backends (netcdf4, as that's first in the priority list).

I've also seen a drastic reduction in performance with HDF5 1.12.2 (both netcdf4 and h5netcdf) on one of my colleague's datasets, so maybe that's much more visible on a mac? That doesn't explain the slow test_dataarray_compute, though.

Do we isolate the dask scheduler in any way? I assume that makes use of the builtin (non-distributed) scheduler?

@dcherian
Copy link
Contributor

dcherian commented Mar 24, 2023

IIRC all distributed tests are in test_distributed.py.

I've also seen a drastic reduction in performance with HDF5 1.12.2 (both netcdf4 and h5netcdf) on one of my colleague's datasets,

The default scheduler is 'threads' and the new netcdf changed some things around thread locking

Maybe this is related to #7079. I restarted #7488

dcherian added a commit to dcherian/xarray that referenced this pull request Mar 26, 2023
* main: (40 commits)
  Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638)
  add timeouts for tests (pydata#7657)
  Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667)
  [pre-commit.ci] pre-commit autoupdate (pydata#7651)
  Allow all integer dtypes in `polyval` (pydata#7619)
  [skip-ci] dev whats-new (pydata#7660)
  Redo whats-new for 2023.03.0 (pydata#7659)
  Set copy=False when calling pd.Series (pydata#7642)
  Pin pandas < 2 (pydata#7650)
  Whats-new for release 2023.03.0 (pydata#7643)
  Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648)
  Use more descriptive link texts (pydata#7625)
  Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598)
  Fix `pcolormesh` with str coords (pydata#7612)
  [skip-ci] Fix groupby binary ops benchmarks (pydata#7603)
  Remove incomplete sentence in IO docs (pydata#7631)
  Allow indexing unindexed dimensions using dask arrays (pydata#5873)
  Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618)
  [pre-commit.ci] pre-commit autoupdate (pydata#7620)
  add a test for scatter colorbar extend (pydata#7616)
  ...
dcherian added a commit to kmsquire/xarray that referenced this pull request Mar 29, 2023
* upstream/main: (716 commits)
  Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638)
  add timeouts for tests (pydata#7657)
  Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667)
  [pre-commit.ci] pre-commit autoupdate (pydata#7651)
  Allow all integer dtypes in `polyval` (pydata#7619)
  [skip-ci] dev whats-new (pydata#7660)
  Redo whats-new for 2023.03.0 (pydata#7659)
  Set copy=False when calling pd.Series (pydata#7642)
  Pin pandas < 2 (pydata#7650)
  Whats-new for release 2023.03.0 (pydata#7643)
  Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648)
  Use more descriptive link texts (pydata#7625)
  Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598)
  Fix `pcolormesh` with str coords (pydata#7612)
  [skip-ci] Fix groupby binary ops benchmarks (pydata#7603)
  Remove incomplete sentence in IO docs (pydata#7631)
  Allow indexing unindexed dimensions using dask arrays (pydata#5873)
  Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618)
  [pre-commit.ci] pre-commit autoupdate (pydata#7620)
  add a test for scatter colorbar extend (pydata#7616)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automation Github bots, testing workflows, release automation CI Continuous Integration tools dependencies Pull requests that update a dependency file plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants