Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference matrix conversion of Csr and Hybrid #302

Merged
merged 12 commits into from
May 21, 2019

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented May 8, 2019

Reference matrix conversion of Csr and Hybrid

@tcojean tcojean requested review from thoasm, pratikvn and tcojean and removed request for thoasm and pratikvn May 9, 2019 10:20
@tcojean tcojean added is:enhancement An improvement of an existing feature. type:matrix-format This is related to the Matrix formats 1:ST:ready-for-review This PR is ready for review mod:reference This is related to the reference module. labels May 9, 2019
Copy link
Collaborator

@hartwiganzt hartwiganzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find anything to complain about.

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Just 2 small nits (adding a const), and one general question about relying on the CSR row_ptrs for nnz_per_row instead of counting, which I would like to discuss before merging.

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@tcojean tcojean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the questions for preserving or not explicit zeros.

size_type ell_idx = 0;
while (csr_idx < csr_row_ptrs[row + 1]) {
const auto val = csr_vals[csr_idx];
if (val != zero<ValueType>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that test correct? To count max_nnz you also count zeros. Doesn't that mean you should also store explicit zeros here? In general, what is our policy on these issues? For some conversions, I think we preserve zeros (CSR <-> COO,CSR->SELLP) but some others that does not seem to be the case (ELL <-> CSR). The lists are non exhaustive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. The result is not the expected hybrid matrix.

// Ell part
for (IndexType col = 0; col < max_nnz_per_row; col++) {
const auto val = ell->val_at(row, col);
if (val != zero<ValueType>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing with keeping explicit zeros or not. And further down also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep explicit zeros for all matrices conversions?
And we also need to implement the function which kicks all zeros out.
Thus, users can decide whether they need to do it.
For reading matrix file now, we seem to delete all zero value, so maybe we also keep them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we should not put the explicit zeros in the ELL format since we might need even more storage that way and I am not sure if we have already have the SpMV improvement in our code that stops as soon as a zero is found.

Actaully, for the Hybrid, I thought it should be fine to ignore all explicit zeros since COO is only used for the parts that don't properly fit into the ELL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ELL part, it will stops when zero is found. It depends on column index not the value in the cuda kernel.
Hybrid use the nnz_per_row to decide the #col of Ell, so the hybrid matrix is not expected when we skipped the explicit zeros.

@hartwiganzt
Copy link
Collaborator

This can be merged? Thanks @tcojean !

@tcojean
Copy link
Member

tcojean commented May 17, 2019

@hartwiganzt There is ongoing discussion, actually maybe if you have time could you give your input?

Summary:
Currently, there is a problem with taking into account the zeros when computing nnz_per_row, but then they are removed in the conversion, which means that you do not get the actual hybrid matrix that you created in the beginning.

We are also wondering in general whether and when we should ignore zeros during conversions or not (for Dense <-> anything that is obvious, but for the rest?).

Currently, we preserve zeros in some cases (CSR <-> COO, CSR->SELLP) but some others that does not seem to be the case (ELL <-> CSR, all read method from files).

@hartwiganzt
Copy link
Collaborator

This is a difficult question. As pointed out, sometimes it can be helpful to have explicit zeros stored. At the same time, if converting ELL->CSR you want to have the zeros removed, obviously. I don't think I have an overall best solution, but maybe different routines handle this differently. Obviously, the documentation should be explicit about it. Is that a variant?

@tcojean
Copy link
Member

tcojean commented May 17, 2019

I guess for this PR we can only focus on making the Hybrid <-> CSR version correct in terms of correctly using the nnz_per_row. You are right that this should be a per case thing anyway, maybe with a loose default policy of keeping explicit zeros whenever possible/it makes sense.

@yhmtsai yhmtsai force-pushed the reference_csr_hybrid_converter branch from bd480e8 to 1b69ea9 Compare May 17, 2019 17:28
@yhmtsai
Copy link
Member Author

yhmtsai commented May 17, 2019

Csr -> Hybrid : keep the explicit zeros
Hybrid -> Csr : delete the explicit zeros in coo or ell part.
I also add another test for them.

@yhmtsai
Copy link
Member Author

yhmtsai commented May 20, 2019

@tcojean I see it failed on pipeline.
It says /usr/bin/ld: final link failed: No space left on device
Is the workstation full? Or, is there something wrong in code.

@tcojean
Copy link
Member

tcojean commented May 20, 2019

@yhmtsai there was indeed a space problem. Now it built but you have a problem in a kernel.

@yhmtsai
Copy link
Member Author

yhmtsai commented May 20, 2019

Fixed it

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just some minor style suggestions.

@tcojean tcojean added 1:ST:do-not-merge Please do not merge PR this yet. and removed 1:ST:ready-for-review This PR is ready for review labels May 20, 2019
@tcojean tcojean added 1:ST:ready-for-review This PR is ready for review 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:do-not-merge Please do not merge PR this yet. 1:ST:ready-for-review This PR is ready for review labels May 21, 2019
@tcojean tcojean merged commit c9be444 into ginkgo-project:develop May 21, 2019
tcojean added a commit that referenced this pull request Oct 20, 2019
The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.1.0. This release brings several performance improvements, adds Windows support, 
adds support for factorizations inside Ginkgo and a new ILU preconditioner
based on ParILU algorithm, among other things. For detailed information, check the respective issue.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
+ Windows
  + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or CygWin.


The current known issues can be found in the [known issues
page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


Additions:
+ Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) 
+ New factorization support in Ginkgo, and addition of the ParILU
  algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324))
+ New ILU preconditioner ([#348](#348), [#353](#353))
+ Windows MinGW and Cygwin support ([#347](#347))
+ Windows Visual studio support ([#351](#351))
+ New example showing how to use ParILU as a preconditioner ([#358](#358))
+ New example on using loggers for debugging ([#360](#360))
+ Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306))
+ Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303))
+ New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317))
+ Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310))
+ Support for sorting rows in the CSR format by column idices ([#322](#322))
+ Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345))
+ Addition of a LinOp to handle perturbations of the form (identity + scalar *
  basis * projector) ([#334](#334))
+ New sparsity matrix representation format with Reference and OpenMP
  kernels ([#349](#349), [#350](#350))

Fixes:
+ Accelerate GMRES solver for CUDA executor ([#363](#363))
+ Fix BiCGSTAB solver convergence ([#359](#359))
+ Fix CGS logging by reporting the residual for every sub iteration ([#328](#328))
+ Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295))
+ Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318))
+ Fixed slowdown of COO SpMV on OpenMP ([#340](#340))
+ Fix gcc 6.4.0 internal compiler error ([#316](#316))
+ Fix compilation issue on Apple clang++ 10 ([#322](#322))
+ Make Ginkgo able to compile on Intel 2017 and above ([#337](#337))
+ Make the benchmarks spmv/solver use the same matrix formats ([#366](#366))
+ Fix self-written isfinite function ([#348](#348))
+ Fix Jacobi issues shown by cuda-memcheck

Tools and ecosystem:
+ Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365))
+ Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361))
+ Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309))
+ Add clang-tidy and iwyu support to Ginkgo ([#298](#298))
+ Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments
  to CMake ([#300](#300))
+ Add support for the xSDK R7 policy ([#325](#325))
+ Fix examples in html documentation ([#367](#367))
tcojean added a commit that referenced this pull request Oct 21, 2019
The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.1.0. This release brings several performance improvements, adds Windows support,
adds support for factorizations inside Ginkgo and a new ILU preconditioner
based on ParILU algorithm, among other things. For detailed information, check the respective issue.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


The current known issues can be found in the [known issues
page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


### Additions
+ Upper and lower triangular solvers ([#327](#327), [#336](#336), [#341](#341), [#342](#342)) 
+ New factorization support in Ginkgo, and addition of the ParILU
  algorithm ([#305](#305), [#315](#315), [#319](#319), [#324](#324))
+ New ILU preconditioner ([#348](#348), [#353](#353))
+ Windows MinGW and Cygwin support ([#347](#347))
+ Windows Visual Studio support ([#351](#351))
+ New example showing how to use ParILU as a preconditioner ([#358](#358))
+ New example on using loggers for debugging ([#360](#360))
+ Add two new 9pt and 27pt stencil examples ([#300](#300), [#306](#306))
+ Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks ([#303](#303))
+ New benchmark for sparse matrix format conversions ([#312](https://github.com/ginkgo-project/ginkgo/issues/312)[#317](https://github.com/ginkgo-project/ginkgo/issues/317))
+ Add conversions between CSR and Hybrid formats ([#302](#302), [#310](#310))
+ Support for sorting rows in the CSR format by column idices ([#322](#322))
+ Addition of a CUDA COO SpMM kernel for improved performance ([#345](#345))
+ Addition of a LinOp to handle perturbations of the form (identity + scalar *
  basis * projector) ([#334](#334))
+ New sparsity matrix representation format with Reference and OpenMP
  kernels ([#349](#349), [#350](#350))

### Fixes
+ Accelerate GMRES solver for CUDA executor ([#363](#363))
+ Fix BiCGSTAB solver convergence ([#359](#359))
+ Fix CGS logging by reporting the residual for every sub iteration ([#328](#328))
+ Fix CSR,Dense->Sellp conversion's memory access violation ([#295](#295))
+ Accelerate CSR->Ell,Hybrid conversions on CUDA ([#313](#313), [#318](#318))
+ Fixed slowdown of COO SpMV on OpenMP ([#340](#340))
+ Fix gcc 6.4.0 internal compiler error ([#316](#316))
+ Fix compilation issue on Apple clang++ 10 ([#322](#322))
+ Make Ginkgo able to compile on Intel 2017 and above ([#337](#337))
+ Make the benchmarks spmv/solver use the same matrix formats ([#366](#366))
+ Fix self-written isfinite function ([#348](#348))
+ Fix Jacobi issues shown by cuda-memcheck

### Tools and ecosystem improvements
+ Multiple improvements to the CI system and tools ([#296](#296), [#311](#311), [#365](#365))
+ Multiple improvements to the Ginkgo containers ([#328](#328), [#361](#361))
+ Add sonarqube analysis to Ginkgo ([#304](#304), [#308](#308), [#309](#309))
+ Add clang-tidy and iwyu support to Ginkgo ([#298](#298))
+ Improve Ginkgo's support of xSDK M12 policy by adding the `TPL_` arguments
  to CMake ([#300](#300))
+ Add support for the xSDK R7 policy ([#325](#325))
+ Fix examples in html documentation ([#367](#367))


Related PR: #370
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:enhancement An improvement of an existing feature. mod:reference This is related to the reference module. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants