Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task2: interface #3

Open
1 of 8 tasks
mzuzek opened this issue May 25, 2022 · 1 comment
Open
1 of 8 tasks

task2: interface #3

mzuzek opened this issue May 25, 2022 · 1 comment
Assignees

Comments

@mzuzek
Copy link

mzuzek commented May 25, 2022

Scope

Try to give common "feeling" to both interfaces 2a and 2b

2a. Execution Space

Blas kernels to cover:

The objective would be do things like:

auto exec = ExecSpace(); // instance

 auto A, B, C;

template<class ExecSpace, ... >
 KokkosBlas::gemm( exec, A, B, C );

2b. Parallelization level dispatch

Have parallelization level (serial, team and team-vector) as a parameter - like ArgMode in: https://github.com/kokkos/kokkos-kernels/blob/develop/src/batched/dense/KokkosBatched_Gemm_Decl.hpp#L98-L119 (inspiration):

  • Serial implementation should not use any parallelism nor dispatch to TPLs, so it can be called within any parallel context;
  • Team implementation should take team member argument, be callable in a functor (same where TeamThreadRange) and use TeamThreadRange+ThreadVectorRange combination;
  • TeamVector: WAIT for confirmation: should use ThreadVectorRange only and be callable from inside TeamThreadRange ? Should NOT use TeamVectorRange !

Blas kernels to cover:

@mzuzek mzuzek self-assigned this Jun 9, 2022
@mzuzek
Copy link
Author

mzuzek commented Jun 9, 2022

Parallel Contexts

Note: ThreadVectorRange can be also called like TeamThreadRange (not inside it) - and then works like TeamVectorRange (which is probably better choice - TODO learn what's the difference)

levels

See also HierarchicalParallelism 8.4 in Kokkos Wiki.

Kokkos hierarchy maps to hardware threads slightly different on CPU and GPU:

Kokkos level CPU GPU
Team thread group thread group (block)
Thread CPU thread thread group (warp)
Vector SIMD / intrinsics GPU thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant