Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for partition_spaces and separate execution space instances (GPU streams) in Kokkos Kernels. #1119

Open
wlruys opened this issue Sep 29, 2021 · 1 comment

Comments

@wlruys
Copy link

wlruys commented Sep 29, 2021

Opening up an issue for this after a conversation on the Slack. (feature-request)

Now that CUDA/HIP/SYCL stream support and partition_spaces are developed and more stable in Kokkos Core, it would be great to have this support in Kokkos Kernels as well.

This would allow dispatching BLAS and other kernels of 'medium' size, that are too large for a single block thread team and too small to be worth locking the whole device.

For instance something like:

ExecSpace spaces[N];
partition_space(ExecSpace(),N,spaces);
KokkosBlas::GEMM(spaces[0], "N", "N", one, A0, B0, one, C0);
KokkosBlas::GEMM(spaces[1], "N", "N", one, A1, B1, one, C1);

to dispatch the two kernels asynchronously.

@lucbv
Copy link
Contributor

lucbv commented Oct 25, 2021

@dialecticDolt
I merged the work on this feature in PR #1131 let me know if that meets your requirements?
If so we can probably close this issue, otherwise let's discuss what more is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants