-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAS and cuBLAS TPL interface for TRSM #532
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to merge this if the fall-back case is not implemented? I think it's OK to have a sequential fall-back, even if that means copying to host and running sequentially (e.g., for CudaSpace Views of element types that cuBLAS does not support).
@mhoemmen Thanks for your comments. I think the user is OK to have only TPL interfaces in the 3.0 release (#513). Add @iyamazaki for this. I can cover the sequential fall-back in another PR. |
@iyamazaki wrote:
Sure, that's fine. I already have cuSOLVER wrapped for GEQRF in my branch; it's almost ready. I actually have a CholQR implementation in there. I haven't tried it in a while, but it should be straightforward to revive. |
It would be great to have have GEQRF interface in Kokkos Kernels as well. Looking forward to it ! |
I would be happy if kokkos-kernels imitated the design. In particular, I was careful not to expose any TPL headers or types in TSQR header files. That's important especially because of the cuBLAS v1 vs. v2 header compatibility problem that we've seen before. It's also important not to put |
@mhoemmen I am working on adding the sequential fall-back implementation now. Almost done. Needs to change unit tests though since I used BLAS TRMM to generate data. |
@srajama1 I can work on this GEQRF interface after this PR. |
KOKKOSBLAS3_TRSM_TPL_SPEC_AVAIL_BLAS( Kokkos::complex<double>, Kokkos::LayoutLeft, Kokkos::LayoutLeft, Kokkos::HostSpace) | ||
#endif | ||
#if defined (KOKKOSKERNELS_INST_KOKKOS_COMPLEX_FLOAT_) \ | ||
&& defined (KOKKOSKERNELS_INST_LAYOUTLEFT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out which order this PR and PR #526 should go in. This would affect how this file gets implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjwilke This file is only for TPL. I do not think it will be affected by the new ETI implementation.
@vqd8a is this PR ready for merge? |
@ndellingwood I am still checking on the implementation of conjugate transpose. |
@mhoemmen @ndellingwood I added the sequential fall-back implementation. If it is okay, let me know and I can run spotcheck. |
@vqd8a doesn't hurt to start a spot-check, please also run tests with tpls enabled and disabled to confirm behavior is as expected. Thanks! |
@ndellingwood one question: this PR now have fall-back implementation, should I need to generate ETI files and add them too or should I wait for @jjwilke new ETI implementation? |
@ndellingwood If I run tests, I have to generate these ETI files using the old script. and do I need to push those files to this PR? |
Running spotchecks now. |
Author addressed my concern about __restrict__
.
One spotcheck (with BLAS TPL) passed. Two more are running
|
@vqd8a just to check, does the spot-check include the fallback or just the tpls, and if the fallback is tested does this work with/without ETI files? |
@ndellingwood This one was just with BLAS TPL. I have two others running (one with CUBLAS and one with the fallback). For the fallback, I have to generate the ETI files. So, for merging, should I push the ETI files here too? |
Thanks @ndellingwood @jjwilke Spotcheck with cuBLAS passed:
|
Spotcheck with fall-back implementation passed. ETI files were pushed in.
|
@ndellingwood Spotchecks passed. If possible, can you merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spot-checks have passed, I think @mhoemmen 's comments have been addressed.
Thanks @ndellingwood for merging. |
Address the user request in #513 . Only call cuBLAS for CUDA and BLAS for non-CUDA. The KK implementation might be added in another future PR.