-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dot based gemm #490
Dot based gemm #490
Conversation
@seheracer please edit the PR to change the base to the develop branch instead of master, that should clean up a lot of the commit history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seheracer Thanks a lot !
… in the initializer list.
Thanks @seheracer ! |
Thank you, @seheracer. Could this also be applied to Trilinos dev so that EMPIRE can use it soon? |
@jhux2, I pushed the same PR into Trilinos develop too: trilinos/Trilinos#6226 |
Description:
This PR contains an improved version of KokkosBlas::gemm for CUDA: DotBasedGEMM. DotBasedGEMM implements the optimization for C = betaC + alphaA^TB with A and B matrices both being tall and skinny. C matrix is assumably small, so each entry of C is computed by performing the dot product of respective columns of A and B matrices. Note that the dot products are performed on very long vectors, so each dot product is distributed among multiple teams.
When the conditions of having tall and skinny matrices in the form alpha*A^TB hold, instead of calling CUBLAS' gemm, DotBasedGEMM is called. This is considered as an improvement over CUBLAS' gemm, so DotBasedGEMM never takes place if CUBLAS is not enabled.
Output of test_all_sandia on white (without and with cublas):
../scripts/test_all_sandia --spot-check --arch=Power8,Pascal60
../scripts/test_all_sandia cuda --spot-check --with-cuda-options=enable_lambda --with-tpls=cublas --arch=Power8,Pascal60