Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fallback gemm for complex scalar types #550

Merged
merged 1 commit into from
Dec 18, 2019

Conversation

seheracer
Copy link
Contributor

@seheracer seheracer commented Dec 18, 2019

Description:
This PR fixes the fallback gemm resulting in large diffs with complex scalar type when the entries in the input random matrices have nonzero imaginary parts.

This defect was exposed when the upper bound of the random number generator (for A, B, and C matrices) in KokkosKernels gemm unit test was changed from 10 to Kokkos::Random_XorShift64_Pool<...>::max(), which is equal to 1+1j for complex<double>. This PR includes this change too.

Testing:
[sacer@white29 testing]$ ../scripts/test_all_sandia --spot-check --arch=Power8,Pascal60

Running on machine: white
Going to test compilers:  gcc/6.4.0 gcc/7.2.0 gcc/7.4.0 ibm/16.1.0 cuda/9.2.88 cuda/10.0.130
Testing compiler gcc/6.4.0
Testing compiler gcc/7.2.0
  Starting job gcc-6.4.0-OpenMP_Serial-release
  Starting job gcc-7.2.0-OpenMP-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-6.4.0-OpenMP_Serial-release
Testing compiler gcc/7.4.0
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-Serial-release
Testing compiler ibm/16.1.0
  Starting job gcc-7.4.0-OpenMP-release
  PASSED gcc-7.4.0-OpenMP-release
Testing compiler cuda/9.2.88
  Starting job ibm-16.1.0-Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release
  PASSED ibm-16.1.0-Serial-release
Testing compiler cuda/10.0.130
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-10.0.130-Cuda_Serial-release
  PASSED cuda-10.0.130-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.0.130-Cuda_Serial-release build_time=1203 run_time=1057
cuda-9.2.88-Cuda_OpenMP-release build_time=1269 run_time=685
gcc-6.4.0-OpenMP_Serial-release build_time=587 run_time=1048
gcc-7.2.0-OpenMP-release build_time=444 run_time=299
gcc-7.2.0-OpenMP_Serial-release build_time=605 run_time=1359
gcc-7.2.0-Serial-release build_time=276 run_time=638
gcc-7.4.0-OpenMP-release build_time=385 run_time=300
ibm-16.1.0-Serial-release build_time=1439 run_time=968
#######################################################
FAILED TESTS
#######################################################

Copy link
Contributor

@ndellingwood ndellingwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thanks for reintroducing the test improvements!

@ndellingwood ndellingwood merged commit e63acda into kokkos:develop Dec 18, 2019
@seheracer seheracer deleted the fix_fallback_gemm branch December 18, 2019 21:58
@vqd8a
Copy link
Contributor

vqd8a commented Dec 18, 2019

@ndellingwood @seheracer I have just noticed that Kokkos::rand<typename Kokkos::Random_XorShift64_Pool<execution_space>::generator_type,ScalarA>::max()= 1 when ScalarA is Kokkos::complex<double>. Am I correct? So you just did reduce from 10 to 1.

@ndellingwood
Copy link
Contributor

@vqd8a in the code @seheracer is using Kokkos::Random_XorShift64_Pool<execution_space>::generator_type,ScalarA>::max(); in the fill_random calls, which were previously using 10 (cast to the scalar type). Where is the value 1 you are concerned about?

@vqd8a
Copy link
Contributor

vqd8a commented Dec 18, 2019

@ndellingwood It was Kokkos::fill_random(A, rand_pool,ScalarA(10)).
Now it is Kokkos::fill_random(A, rand_pool, Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max()) where Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max()=1
So, I think we actually reduce from 10 to 1, not increasing. Correct me if I am wrong.

@seheracer
Copy link
Contributor Author

seheracer commented Dec 18, 2019

@vqd8a This is very surprising to me! Changing it from 10 to max results in larger diffs so that several bugs were exposed. That's why I thought max should be larger than 10.

Vinh, can you point us to the code where you saw "1" as the returned value for max? Thanks!

@vqd8a
Copy link
Contributor

vqd8a commented Dec 18, 2019

@seheracer I just printed out Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max() and saw that value. Actually it is 1+1j in the complex case.

@ndellingwood
Copy link
Contributor

@crtrott can you confirm what Vinh is seeing is as expected?

Kokkos::rand<Kokkos::Random_XorShift64<execution_space>,Kokkos::complex<double> >::max() and saw that value. Actually it is 1+1j in the complex case.

@vqd8a
Copy link
Contributor

vqd8a commented Dec 18, 2019

I think it is 1+1j. See the file https://github.com/kokkos/kokkos/blob/develop/algorithms/src/Kokkos_Random.hpp, around line 511 to 515

@crtrott
Copy link
Member

crtrott commented Dec 18, 2019

That is correct. So probably the additional exposure happens because you now have actual complex values, instead of complex values where the imaginary part is zero.

@seheracer
Copy link
Contributor Author

Thanks @vqd8a @crtrott @ndellingwood !!

I think having the upper bound this way (1+1j) is better than 10 to catch errors concerning the imaginary parts. In addition, this is how Tpetra is testing gemm, so the tests are consistent now. I think we should keep ::max() instead of 10.

@mhoemmen
Copy link
Contributor

@seheracer wrote:

In addition, this is how Tpetra is testing gemm, so the tests are consistent now.

Wait, Tpetra is calling a hand-rolled GEMM ? Tpetra only tests for types that the BLAS and cuBLAS supports. It's great that we have a fall-back implementation, but why aren't we calling cuBLAS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants