Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 #589

Closed
e10harvey opened this issue Jan 30, 2020 · 15 comments
Closed

Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 #589

e10harvey opened this issue Jan 30, 2020 · 15 comments

Comments

@e10harvey
Copy link
Contributor

e10harvey commented Jan 30, 2020

This issue is to address a subset of feedback from #552.

The following items should be addressed:

  1. KokkosSparse_sptrsv_aux.hpp definitions of {for,back}wardP_supernode should be replaced with the correct KokkosBLAS / LAPACK / KokkosKernels routines. Will KokkosKernels::Impl::permute_vector or KokkosKernels::Impl::permute_block_vector work?
  2. KokkosSparse_sptrsv_aux.hpp definition of print_crsmat should be replaced with a new routine (print_2Dview) in KokkosKernels::Impl::print_2Dview that uses KokkosKernels::Impl::print_1Dview
  3. EDIT: We should have two specializations for trmm: ETI and TPL_CBLAS. Similarly we should have two specializations for trtri: ETI and TPL_LAPACKE (this is HostBlas). If I restructure this code correctly, the following behavior (in general) should be realized by adding KokkosBlas support for trmm and trtri:
  • For KokkosBlas3::trmm:
    • If cblas is NOT available at compile time, the eti specialization (that uses Kokkos routines) will be selected.
    • if cblas is available at compile time, both eti and tpl specializations can be made available at compile time.
  • Similarly for KokkosBlas3::trtri:
    • If lapacke is NOT available at compile time, the eti specialization will be selected. Otherwise, both the eti and tpl specialization can be made available.

EDIT: @srajama1: is no 3. correct?

@srajama1, please let me know if this is what you had in mind.

@srajama1
Copy link
Contributor

@e10harvey : Yes for no. 1. I don't understand the question for permute_vector.

  1. I don't see how 2D view can print CRS matrix. This is sparse. Nevertheless, we have CRS printing in many places, it would be good to have it an Util.

@ndellingwood
Copy link
Contributor

Is part of this issue to also provide the blas and lapack capabilities necessary to replace the cblas and lapacke calls in the supernodal sptrsv, or will that be discussed and addressed by a different issue?

@srajama1
Copy link
Contributor

@ndellingwood This issue basically does that.

@e10harvey
Copy link
Contributor Author

I don't understand the question for permute_vector.

This is mainly just a note to myself (which I should have made clear) -- I'm looking into which internal KokkosKernels routines we can use instead of these {for,back}wardP_supernode routines.

1. I don't see how 2D view can print CRS matrix. This is sparse. Nevertheless, we have CRS printing in many places, it would be good to have it an Util.

Got it -- I'll have to look into this one further as well.

@e10harvey
Copy link
Contributor Author

@srajama1, @ndellingwood: I am confused. This issue is to address items no. 1 and no. 2 above which was gleaned from the PR feedback left by Mark.

What specifically in the supernodal sptrsv code needs to be replaced?

@e10harvey
Copy link
Contributor Author

What specifically in the supernodal sptrsv code needs to be replaced?

@iyamazaki Can you elaborate?

@e10harvey
Copy link
Contributor Author

@iyamazaki: Do you have your Matrix Market formatted test files for the perf_test KokkosKernels_sparse_sptrsv.exe? If not, can you share how to generate those test files?

@iyamazaki
Copy link
Contributor

Thank you so much for looking at this, @e10harvey !! In the current code, I use xTRMM and xTRTRI to setup our solver. Since the setup is currently on the host (sequential), I am calling cblas_xtrmm and LAPACKE_xtrtri.

@iyamazaki
Copy link
Contributor

Hi, @e10harvey, again. I need to check with @ndellingwood . I have enabled my code only through make-script (e.g., compileKokkosKernelsSimple.sh), and have not tried using cmake.

@e10harvey
Copy link
Contributor Author

Thank you so much for looking at this, @e10harvey !! In the current code, I use xTRMM and xTRTRI to setup our solver. Since the setup is currently on the host (sequential), I am calling cblas_xtrmm and LAPACKE_xtrtri.

@iyamazaki: No problem, thanks for the swift responses! Would you please point me to where in the code you're using "xTRMM and xTRTRI to setup our solver"? Also, where are "cblas_xtrmm and LAPACKE_xtrtri" called?

Hi, @e10harvey, again. I need to check with @ndellingwood . I have enabled my code only through make-script (e.g., compileKokkosKernelsSimple.sh), and have not tried using cmake.

Hi, @iyamazaki. Does compileKokkosKernelsSimple.sh generate the Matrix Market formatted test files for the perf_test KokkosKernels_sparse_sptrsv.exe?

@iyamazaki
Copy link
Contributor

@e10harvey: The cblas and lapacke functions are called in src/sparse/KokkosSparse_sptrsv_supernode.hpp and src/sparse/KokkosSparse_sptrsv_superlu.hpp.

That is for the general sparse-triangular solve. The perf_tests for the supernodal triangular solve are KokkosSparse_sptrsv_superlu.exe and KokkosSparse_sptrsv_cholmod.exe.

@e10harvey
Copy link
Contributor Author

@iyamazaki: Thanks! I have:

$ ./KokkosKernels_sparse_sptrsv.exe --help
Options:
  --test [OPTION] : Use different kernel implementations
                    Options:
                      lvlrp, lvltp1, lvltp2

                      cusparse           (Vendor Libraries)

  -lf [file]       : Read in Matrix Market formatted text file 'file'.
  -uf [file]       : Read in Matrix Market formatted text file 'file'.
  --offset [O]    : Subtract O from every index.
                    Useful in case the matrix market file is not 0 based.

  -rpt [K]        : Number of Rows assigned to a thread.
  -ts [T]         : Number of threads per team.
  -vl [V]         : Vector-length (i.e. how many Cuda threads are a Kokkos 'thread').
  --loop [LOOP]       : How many spmv to run to aggregate average time. 

How do I run KokkosKernels_sparse_sptrsv.exe? Are the sets of input files for the -lf and -uf arguments that you used when testing?

@iyamazaki
Copy link
Contributor

Hi, @e10harvey That perf_test is for a general sparse-triangular solve by @ndellingwood . If you want to run the supernodal version (that uses CBLAS and LAPACKE), then you want to run KokkosSparse_sptrsv_superlu.exe or KokkosSparse_sptrsv_cholmod.exe

@srajama1
Copy link
Contributor

srajama1 commented Feb 8, 2020

@e10harvey The SuperLU TPL changes for CMake are not in develop.

@jjwilke has a PR but that has conflict. See
#546

Use make system for now.

@e10harvey e10harvey changed the title Add three BLAS/LAPACK calls needed by: Sptrsv supernode #552 Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 Feb 19, 2020
@ndellingwood ndellingwood added this to the 3.2 Release milestone Mar 10, 2020
@e10harvey
Copy link
Contributor Author

Addressed via #622 and #697.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants