Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

Closed
ndellingwood opened this issue Nov 17, 2022 · 2 comments · Fixed by #1608
Closed

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

ndellingwood opened this issue Nov 17, 2022 · 2 comments · Fixed by #1608

Comments

@ndellingwood
Copy link
Contributor

The batched_dla unit test runtime is consistently near or hitting timeout of 1500 seconds in Cuda and Hip nightly builds. This will need to be reduced before the 4.0 release to reliably pass CI with the Trilinos snapshot.
A couple options to reduce time include splitting the tests in the cpp file into separate cpp files and/or reducing problem sizes if feasible. Other thoughts?

This is also occurring with Trilinos e.g. trilinos/Trilinos#11235 , though for those builds with 3.7.00 I hope the issue can be mitigated with the RUN_SERIAL option (assuming tests are run with parallelism)

Reproducer (Weaver rhel8 queue, Cuda build):

module load cuda/11.2.2/gcc/8.3.1 cmake/3.23.1

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-cuda --with-serial --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --arch=Volta70,Power9 --with-cuda-options=enable_lambda --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --cxxstandard=17
@e10harvey
Copy link
Contributor

Aside from reducing the test input sizes or increasing the timeout, I think splitting the tests up is our only other option.

@lucbv
Copy link
Contributor

lucbv commented Nov 21, 2022

I would be in favor of splitting the test as it would also reduce the compile/link time for parallel builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants