Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

Closed
brian-kelley opened this issue Apr 6, 2021 · 1 comment
Closed

Comments

@brian-kelley
Copy link
Contributor

See discussion at trilinos/Trilinos#8923

@jennloe saw a substantial speedup in MultiVector * MultiVector product by calling gemv when the RHS has only one column. This could be implemented in KokkosKernels easily by adding a special path in KokkosBlas::gemm() to call gemv instead (outside of the unification layer). We just have to find the right heuristics for when this should be done. The things to consider are TPLs (Jennifer's results were with cublas gemm), layout of the LHS matrix, and dimensions.

@brian-kelley brian-kelley self-assigned this Apr 6, 2021
@brian-kelley brian-kelley changed the title KokkosBlas::gemm should use a gemv kernel if the 2nd arg has only 1 column KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column Apr 6, 2021
@brian-kelley
Copy link
Contributor Author

Performance measurements on V100, double precision. A (LHS) is m*n, B (RHS) is n x 1, 1 <= n <= 50 in the ICGS orthogonalization use case.
m = 1 million, just testing n = 1 and n = 50.

n = 1 Flops (KK GEMM) Flops (cuBLAS GEMM) Flops (KK GEMV) Flops (cuBLAS GEMV)
A LayoutLeft, B LayoutLeft 9.005e+08 2.878e+10 3.111e+10 5.425e+10
A LayoutLeft, B LayoutRight 1.017e+09 1.007e+09
A LayoutRight, B LayoutLeft 9.144e+08 9.140e+08 1.263e+09 1.263e+09
A LayoutRight, B LayoutRight 1.051e+09 1.051e+09
n = 50
A LayoutLeft, B LayoutLeft 1.443e+10 1.978e+11 1.143e+11 1.978e+11
A LayoutLeft, B LayoutRight 1.675e+10 1.675e+10
A LayoutRight, B LayoutLeft 1.501e+10 1.500e+10 6.315e+10 6.314e+10
A LayoutRight, B LayoutRight 1.794e+10 1.794e+10

This suggests that cuBLAS GEMM isn't even getting called except for the LayoutLeft/LayoutLeft case, and that using GEMV instead results in a significant improvement in all the cases (except for cublas, n = 50, left/left, where it seems cuBLAS is already using GEMV in the k=1 case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants