Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda.sparse_spmv_tensor_core failures with clang/7+cuda/9 #1314

Closed
ndellingwood opened this issue Feb 11, 2022 · 6 comments
Closed

cuda.sparse_spmv_tensor_core failures with clang/7+cuda/9 #1314

ndellingwood opened this issue Feb 11, 2022 · 6 comments
Assignees

Comments

@ndellingwood
Copy link
Contributor

@cwpearson when testing on kokkos-dev-2 after merge of #1307 I encountered test failures with spmv tensor code tests, can you investigate? I'm not sure if the failures are related to #1307 or showed up with separate changes

Output failure snip:

4: [ RUN      ] cuda.sparse_spmv_tensor_core_double_double_double_int_size_t_LayoutLeft_TestExecSpace
4: KokkosSparse::Test::spmv_tc: 16 errors of 16 for mv 0 (alpha=1, beta=0, mode = N)
4: /ascldap/users/ndellin/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:1306: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
4: KokkosSparse::Test::spmv_tc: 16 errors of 16 for mv 1 (alpha=1, beta=0, mode = N)
4: /ascldap/users/ndellin/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:1306: Failure
4: Value of: num_errors == 0
4:   Actual: false
4: Expected: true
...

Reproducer (kokkos-dev-2):

module load sems-archive-env sems-archive-cmake/3.17.1 sems-archive-gcc/7.2.0 sems-archive-cuda/9.2 sems-archive-clang/7.0.1

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Cuda,OpenMP --arch=Volta70 --compiler=clang++ --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="14" --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft
@cwpearson cwpearson self-assigned this Feb 11, 2022
@cwpearson
Copy link
Contributor

First run

source /projects/sems/modulefiles/utils/sems-archive-modules-init.sh

@cwpearson
Copy link
Contributor

I think this happens because here:

#if defined(KOKKOS_HALF_T_IS_FLOAT) && !KOKKOS_HALF_T_IS_FLOAT

Kokkos::ArithTraits for half is only properly defined when KOKKOS_HALF_T_IS_FLOAT = 0, otherwise, it uses float. This causes the half-precision tolerances to be too tight

@cwpearson
Copy link
Contributor

Yes, prec() for half is actually float in CUDA 9 on the host.

@cwpearson
Copy link
Contributor

@cwpearson
Copy link
Contributor

Hopefully fixed in #1322

@cwpearson
Copy link
Contributor

A more comprehensive fix for the root issue is part of #1329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants