You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening up an issue for this after a conversation on the Slack. (feature-request)
Now that CUDA/HIP/SYCL stream support and partition_spaces are developed and more stable in Kokkos Core, it would be great to have this support in Kokkos Kernels as well.
This would allow dispatching BLAS and other kernels of 'medium' size, that are too large for a single block thread team and too small to be worth locking the whole device.
@dialecticDolt
I merged the work on this feature in PR #1131 let me know if that meets your requirements?
If so we can probably close this issue, otherwise let's discuss what more is needed.
Opening up an issue for this after a conversation on the Slack. (feature-request)
Now that CUDA/HIP/SYCL stream support and
partition_spaces
are developed and more stable in Kokkos Core, it would be great to have this support in Kokkos Kernels as well.This would allow dispatching BLAS and other kernels of 'medium' size, that are too large for a single block thread team and too small to be worth locking the whole device.
For instance something like:
to dispatch the two kernels asynchronously.
The text was updated successfully, but these errors were encountered: