Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batched_vector_arithmatic_simd_dcomplex{2,4} failures w/ GCC 10.2.0 & c++17, with or without blas tpl enabled #1512

Closed
e10harvey opened this issue Aug 31, 2022 · 2 comments

Comments

@e10harvey
Copy link
Contributor

Arithmetic failures w/ GCC 10.2.0 & c++17 in serial backend:

[ RUN      ] serial.batched_vector_arithmatic_simd_dcomplex2
/path/to/kokkos-kernels/unit_test/batched/dense/Test_Batched_VectorArithmatic.hpp:90: Failure
The difference between ats::abs(c[k]) and ats::abs(alpha + b[k]) is 1.1944432591914926, which exceeds eps * ats::abs(c[k]), where
ats::abs(c[k]) evaluates to 1.9116719714609276,
ats::abs(alpha + b[k]) evaluates to 0.71722871226943496, and
eps * ats::abs(c[k]) evaluates to 4.2447644764929739e-13.
<snip>
[ RUN      ] serial.batched_vector_arithmatic_simd_dcomplex4
/path/to/kokkos-kernels/unit_test/batched/dense/Test_Batched_VectorArithmatic.hpp:90: Failure
The difference between ats::abs(c[k]) and ats::abs(alpha + b[k]) is 0.58011900740885669, which exceeds eps * ats::abs(c[k]), where
ats::abs(c[k]) evaluates to 1.9116719714609276,
ats::abs(alpha + b[k]) evaluates to 1.3315529640520709, and
eps * ats::abs(c[k]) evaluates to 4.2447644764929739e-13.
<snip>

Similar arithmetic failures occur in the threads and openmp backends.

Reproducer

source /etc/profile.d/modules.sh
module purge
module load cmake/3.19.3 gcc/10.2.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Threads,Serial --arch=SKX --compiler=$CXX --cxxflags="-O3 -Wall -Wunused-parameter -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized " --cxxstandard="17" --ldflags=""   --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft --with-tpls=    --with-options= --with-cuda-options=   --no-examples

Note that batched_vector_arithmatic_simd_dcomplex3 passes in the serial, openmp, and threads backend.

@ndellingwood
Copy link
Contributor

Cross-referencing #969 (I think same test failure with gcc/10.2.0)

e10harvey added a commit to e10harvey/kokkos-kernels that referenced this issue Sep 8, 2022
  - A timing bug was hidden with the lenthier epilogue generated by
  gcc 7.2.0. This commit adds a memory barrier for GNU compilers after
  avx512 broadcast intrinsics to ensure the broadcasted writes land before
  the memory locations are read from. Fixes kokkos#1512.
e10harvey added a commit to e10harvey/kokkos-kernels that referenced this issue Sep 12, 2022
  - A timing bug was hidden with the lenthier epilogue generated by
  gcc 7.2.0. This commit adds a memory barrier for GNU compilers after
  avx512 broadcast intrinsics to ensure the broadcasted writes land before
  the memory locations are read from. Fixes kokkos#1512.
@ndellingwood
Copy link
Contributor

Can this be closed with merged PR #1520?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants