-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move {Team,TeamVector}Gemv to KokkosBlas #1435
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
95e2c84
to
5a76079
Compare
const auto run = [&](auto mode) { | ||
using algo = KokkosBatched::Algo::Gemv::Default; | ||
using impl = KokkosBatched::TeamVectorGemv<TeamType, decltype(mode), algo>; | ||
impl::invoke(team, alpha, A, x, beta, y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment as in #1433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my serial gemv
answer. In case of team-vector gemv
I started with batched implementation, because recently it was expanded unit batch specialization that - if I understand correctly - aims to have no batched overload (see PR#1392).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed, I'll move non-batched variant to blas
and dispatch optimized batched
calls to it.
If time permits, I'll address mixed scalar types and missing Conjugate
/ConjTranspose
modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll put it on hold until we get TeamVector clarified.
Please note that batched TeamVector gemv
and Team gemv
both call TeamThreadRange
(and thus both are "team-level" or "functor-level" routines): the difference is that Team variant does not use ThreadVectorRange
.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
4bb4954
to
5fb9ea6
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
After moving |
- move implementations - refactor/merge unit tests - support different sclar types for A, x and y - support arbitrary types for alpha and beta
5fb9ea6
to
4bc8797
Compare
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; This inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: mzuzek |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_Tpls_GCC720_GCC740 # 26 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight # 356 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 24 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 24 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 24 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 1145 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001 # 24 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Failure: Timed out waiting for job KokkosKernels_PullRequest_GCC720 to start: Total Wait = 3603
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
Using Repos:
Pull Request Author: mzuzek |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_Tpls_GCC720_GCC740
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Adds the 12-arg overload of invoke, see kokkos#1435 Potential (short term?) solution to kokkos#1537
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
Compatibility update corresponding to kokkos/kokkos-kernels#1435 Additional changes in PR kokkos/kokkos-kernels#1561
@lucbv @fnrizzi @kliegeois
Similar to #1433, this PR moves
TeamGemv
andTeamVectorGemv
fromKokkosBatched
toKokkosBlas
- except from batch processing (rank-3A
matrix) variants.Status
KokkosBlas
;ConjTranspose
mode and add newConjNoTranspose
mode (also to serial) - used in BSRspmv
;TeamGEMV
and use batched routines in BSRspmv
;TeamGemv
andTeamVectorGemv
on CUDA;ThreadVector
implementation (which contains onlyThreadVectorRange
and can be executed from withinTeamThreadRange
;Gemv<Mode, Algo>
selective interface with serial/team/team-vector dispatch;