Non-transpose gemv should use TeamPolicy on GPUs #926

brian-kelley · 2021-04-01T21:06:37Z

Our implementation of dense mat-vec (gemv) uses a range policy (single level of parallelism) on all execution spaces. It should be using a TeamPolicy since it's trivial to parallelize the sums within each row.

brian-kelley · 2021-04-01T21:29:48Z

This should be benchmarked for local matrices sized about 1e4 x 1e5, since this is getting used on large row-partitioned square matrices.

Add fast two-level mode N GEMV (#926)

brian-kelley added the enhancement label Apr 1, 2021

brian-kelley self-assigned this Apr 1, 2021

brian-kelley mentioned this issue Apr 20, 2021

Add fast two-level mode N GEMV (#926) #939

Merged

brian-kelley added a commit that referenced this issue Apr 27, 2021

Merge pull request #939 from brian-kelley/TeamGEMV

73fbbc8

Add fast two-level mode N GEMV (#926)

brian-kelley added the InDevelop label May 3, 2021

ndellingwood closed this as completed Oct 11, 2022

kokkos-devops-admin mentioned this issue Feb 20, 2024

CMake: error out in certain case #2115

Merged

kokkos-devops-admin mentioned this issue May 24, 2024

Interface for LAPACK geqrf() #2205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-transpose gemv should use TeamPolicy on GPUs #926

Non-transpose gemv should use TeamPolicy on GPUs #926

brian-kelley commented Apr 1, 2021 •

edited

Loading

brian-kelley commented Apr 1, 2021

Non-transpose gemv should use TeamPolicy on GPUs #926

Non-transpose gemv should use TeamPolicy on GPUs #926

Comments

brian-kelley commented Apr 1, 2021 • edited Loading

brian-kelley commented Apr 1, 2021

brian-kelley commented Apr 1, 2021 •

edited

Loading