Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{serial,openmp}.sparse_spmv_mv failures with gcc/10 and gcc/10+armpl/21 #1331

Open
e10harvey opened this issue Feb 17, 2022 · 5 comments
Open
Labels

Comments

@e10harvey
Copy link
Contributor

e10harvey commented Feb 17, 2022

@lucbv, when standing up the A64FX CI testing I encountered this test failure. Can you investigate?

Snippet of ctest output:

4: [       OK ] serial.sparse_spmv_mv_struct_double_int_size_t_LayoutLeft_TestExecSpace (1 ms)
4: [ RUN      ] serial.sparse_spmv_mv_kokkos_complex_double_int_int_LayoutLeft_TestExecSpace
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(0,0), mode = N)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(1,0), mode = N)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(-1,0), mode = N)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(2.5,0), mode = N)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(0,0), mode = C)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(1,0), mode = C)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(-1,0), mode = C)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_mv: 200 errors of 200 for mv 12 (alpha=(2.5,0), beta=(2.5,0), mode = C)
4: /path/to/workspace/KokkosKernels_PullRequest_Tpls_ARMPL2110_Tpls_ARMPL2030_GCC1020/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:214: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
 4/17 Test  #4: sparse_serial ....................***Exception: SegFault 85.38 sec

Reproducer instructions

cd kokkos/
git checkout -f 0d19eebfa26d076f551d5b7a43230f627887df21
cd ../kokkos-kernels/
git checkout -f f5d7490dee7751a5a3cff8242e7de9f6ad6fe5b2
cd ../
mkdir testing
cd testing/
../kokkos-kernels/scripts/cm_test_all_sandia --spot-check-tpls armpl/21.1.0 --with-tpls=armpl --kokkos-path=../kokkos --kokkoskernels-path=../kokkos-kernels

Note: This is reproducible with both OMP_NUM_THREADS=48 and 47.

Note that this only occurs in the spmv_mv_heavy test. See https://github.com/kokkos/kokkos-kernels/pull/1555/files#diff-451dcf2546f551c9894dd3e3820ba37ea8765ba3ae8de9fc31e04f248910fde2R568.

@e10harvey e10harvey added the bug label Feb 17, 2022
@e10harvey e10harvey changed the title serial.sparse_spmv_mv failures with gcc/10+armpl/21 {serial,openmp}.sparse_spmv_mv failures with gcc/10+armpl/21 Feb 17, 2022
@e10harvey
Copy link
Contributor Author

CC: @jgfouca

@e10harvey
Copy link
Contributor Author

@lucbv: Are there any updates on this? Once this is resolved, we can enable Armpl CI checks and improve our code coverage.

@lucbv
Copy link
Contributor

lucbv commented May 19, 2022

@e10harvey sorry it took quite long, PR #1412 might have fixed these, at least I hope it did even though I did not build and test on Inouye. Let me know you see some improvement?

@e10harvey
Copy link
Contributor Author

Great! It's running now : )

@e10harvey
Copy link
Contributor Author

e10harvey commented May 19, 2022

@lucbv: We're still seeing these errors with OMP_NUM_THREADS=48. See this console output for details. It may be worth testing with OMP_NUM_THREADS=47. I will disable spmv_mv on armpl for now so we can start protecting against regressions.

@e10harvey e10harvey changed the title {serial,openmp}.sparse_spmv_mv failures with gcc/10+armpl/21 {serial,openmp}.sparse_spmv_mv failures with gcc/10 and gcc/10+armpl/21 Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants