KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

brian-kelley · 2021-04-06T20:24:53Z

See discussion at trilinos/Trilinos#8923

@jennloe saw a substantial speedup in MultiVector * MultiVector product by calling gemv when the RHS has only one column. This could be implemented in KokkosKernels easily by adding a special path in KokkosBlas::gemm() to call gemv instead (outside of the unification layer). We just have to find the right heuristics for when this should be done. The things to consider are TPLs (Jennifer's results were with cublas gemm), layout of the LHS matrix, and dimensions.

The text was updated successfully, but these errors were encountered:

brian-kelley · 2021-04-30T00:51:04Z

Performance measurements on V100, double precision. A (LHS) is m*n, B (RHS) is n x 1, 1 <= n <= 50 in the ICGS orthogonalization use case.
m = 1 million, just testing n = 1 and n = 50.

n = 1	Flops (KK GEMM)	Flops (cuBLAS GEMM)	Flops (KK GEMV)	Flops (cuBLAS GEMV)
A LayoutLeft, B LayoutLeft	9.005e+08	2.878e+10	3.111e+10	5.425e+10
A LayoutLeft, B LayoutRight	1.017e+09	1.007e+09
A LayoutRight, B LayoutLeft	9.144e+08	9.140e+08	1.263e+09	1.263e+09
A LayoutRight, B LayoutRight	1.051e+09	1.051e+09
n = 50
A LayoutLeft, B LayoutLeft	1.443e+10	1.978e+11	1.143e+11	1.978e+11
A LayoutLeft, B LayoutRight	1.675e+10	1.675e+10
A LayoutRight, B LayoutLeft	1.501e+10	1.500e+10	6.315e+10	6.314e+10
A LayoutRight, B LayoutRight	1.794e+10	1.794e+10

This suggests that cuBLAS GEMM isn't even getting called except for the LayoutLeft/LayoutLeft case, and that using GEMV instead results in a significant improvement in all the cases (except for cublas, n = 50, left/left, where it seems cuBLAS is already using GEMV in the k=1 case).

brian-kelley added the enhancement label Apr 6, 2021

brian-kelley self-assigned this Apr 6, 2021

brian-kelley changed the title ~~KokkosBlas::gemm should use a gemv kernel if the 2nd arg has only 1 column~~ KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column Apr 6, 2021

brian-kelley mentioned this issue Apr 30, 2021

GEMM: call GEMV instead in certain cases #948

Merged

brian-kelley added the InDevelop label May 3, 2021

brian-kelley mentioned this issue May 3, 2021

KokkosKernels: gemv/gemm perf improvements trilinos/Trilinos#9083

Merged

ndellingwood closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

brian-kelley commented Apr 6, 2021

brian-kelley commented Apr 30, 2021

KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

KokkosBlas::gemm should use a gemv kernel if the RHS has only 1 column #929

Comments

brian-kelley commented Apr 6, 2021

brian-kelley commented Apr 30, 2021