Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

ndellingwood · 2022-11-17T18:57:32Z

The batched_dla unit test runtime is consistently near or hitting timeout of 1500 seconds in Cuda and Hip nightly builds. This will need to be reduced before the 4.0 release to reliably pass CI with the Trilinos snapshot.
A couple options to reduce time include splitting the tests in the cpp file into separate cpp files and/or reducing problem sizes if feasible. Other thoughts?

This is also occurring with Trilinos e.g. trilinos/Trilinos#11235 , though for those builds with 3.7.00 I hope the issue can be mitigated with the RUN_SERIAL option (assuming tests are run with parallelism)

Reproducer (Weaver rhel8 queue, Cuda build):

module load cuda/11.2.2/gcc/8.3.1 cmake/3.23.1

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-cuda --with-serial --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --arch=Volta70,Power9 --with-cuda-options=enable_lambda --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --cxxstandard=17

The text was updated successfully, but these errors were encountered:

e10harvey · 2022-11-17T19:15:45Z

Aside from reducing the test input sizes or increasing the timeout, I think splitting the tests up is our only other option.

lucbv · 2022-11-21T15:35:22Z

I would be in favor of splitting the test as it would also reduce the compile/link time for parallel builds.

ndellingwood added enhancement BlocksPromotion labels Nov 17, 2022

ndellingwood mentioned this issue Nov 17, 2022

KokkosKernels: KokkosKernels_batched_dla tests timing out in ATDM CUDA builds starting before 2022-10-01 trilinos/Trilinos#11235

Closed

ndellingwood mentioned this issue Dec 6, 2022

Batched dense tests: splitting batched dense unit-tests #1608

Merged

ndellingwood added the InDevelop label Dec 6, 2022

ndellingwood linked a pull request Dec 6, 2022 that will close this issue

Batched dense tests: splitting batched dense unit-tests #1608

Merged

ndellingwood closed this as completed Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

ndellingwood commented Nov 17, 2022

e10harvey commented Nov 17, 2022

lucbv commented Nov 21, 2022

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

Timeouts with batched_dla_{cuda,hip} tests - reduce test time #1595

Comments

ndellingwood commented Nov 17, 2022

e10harvey commented Nov 17, 2022

lucbv commented Nov 21, 2022