Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dot based gemm #490

Merged
merged 4 commits into from
Nov 6, 2019
Merged

Dot based gemm #490

merged 4 commits into from
Nov 6, 2019

Conversation

seheracer
Copy link
Contributor

Description:

This PR contains an improved version of KokkosBlas::gemm for CUDA: DotBasedGEMM. DotBasedGEMM implements the optimization for C = betaC + alphaA^TB with A and B matrices both being tall and skinny. C matrix is assumably small, so each entry of C is computed by performing the dot product of respective columns of A and B matrices. Note that the dot products are performed on very long vectors, so each dot product is distributed among multiple teams.

When the conditions of having tall and skinny matrices in the form alpha*A^TB hold, instead of calling CUBLAS' gemm, DotBasedGEMM is called. This is considered as an improvement over CUBLAS' gemm, so DotBasedGEMM never takes place if CUBLAS is not enabled.

Output of test_all_sandia on white (without and with cublas):

../scripts/test_all_sandia --spot-check --arch=Power8,Pascal60

Running on machine: white
Going to test compilers:  gcc/6.4.0 gcc/7.2.0 ibm/16.1.0 cuda/9.2.88 cuda/10.0.130
Testing compiler gcc/6.4.0
Testing compiler gcc/7.2.0
  Starting job gcc-7.2.0-OpenMP-release
  Starting job gcc-6.4.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-6.4.0-OpenMP_Serial-release
Testing compiler ibm/16.1.0
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-Serial-release
Testing compiler cuda/9.2.88
  Starting job ibm-16.1.0-Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release
  PASSED ibm-16.1.0-Serial-release
Testing compiler cuda/10.0.130
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-10.0.130-Cuda_Serial-release
  PASSED cuda-10.0.130-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.0.130-Cuda_Serial-release build_time=1124 run_time=283
cuda-9.2.88-Cuda_OpenMP-release build_time=1034 run_time=222
gcc-6.4.0-OpenMP_Serial-release build_time=555 run_time=290
gcc-7.2.0-OpenMP-release build_time=390 run_time=107
gcc-7.2.0-OpenMP_Serial-release build_time=623 run_time=368
gcc-7.2.0-Serial-release build_time=233 run_time=182
ibm-16.1.0-Serial-release build_time=1336 run_time=262
#######################################################
FAILED TESTS
#######################################################

../scripts/test_all_sandia cuda --spot-check --with-cuda-options=enable_lambda --with-tpls=cublas --arch=Power8,Pascal60

Running on machine: white
Going to test compilers:  cuda/9.2.88 cuda/10.0.130
Testing compiler cuda/9.2.88
Testing compiler cuda/10.0.130
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-10.0.130-Cuda_Serial-release
  PASSED cuda-10.0.130-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.0.130-Cuda_Serial-release build_time=1177 run_time=287
cuda-9.2.88-Cuda_OpenMP-release build_time=1206 run_time=223
#######################################################
FAILED TESTS
#######################################################

@seheracer seheracer requested a review from srajama1 November 5, 2019 00:14
@seheracer seheracer self-assigned this Nov 5, 2019
@ndellingwood
Copy link
Contributor

@seheracer please edit the PR to change the base to the develop branch instead of master, that should clean up a lot of the commit history.

@seheracer seheracer changed the base branch from master to develop November 5, 2019 15:14
Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seheracer Thanks a lot !

@ndellingwood
Copy link
Contributor

Thanks @seheracer !

@jhux2
Copy link

jhux2 commented Nov 6, 2019

Thank you, @seheracer. Could this also be applied to Trilinos dev so that EMPIRE can use it soon?

@seheracer
Copy link
Contributor Author

Thank you, @seheracer. Could this also be applied to Trilinos dev so that EMPIRE can use it soon?

@jhux2, I pushed the same PR into Trilinos develop too: trilinos/Trilinos#6226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants