Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosen SparseMatrixCSC colptr and rowval integer type restriction #31661

Closed

Conversation

Pbellive
Copy link
Contributor

@Pbellive Pbellive commented Apr 9, 2019

This fixes #31435. Please see that issue for full explanation. @KlausC any objection to this change?

…restrictive. Integer type needs to be wide enough to store matrix with number of non-zeros given, not any sparse matrix of equal size.
@ararslan ararslan added bugfix This change fixes an existing bug sparse Sparse arrays labels Apr 9, 2019
@ararslan ararslan added this to the 1.2 milestone Apr 9, 2019
@ViralBShah
Copy link
Member

I noted a potential objection in #31435.

Copy link
Contributor

@KlausC KlausC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be: nnzval < typemax(Ti) for correct handling of corner case Ti = Int; nnzval = typemax(Ti).

error text "Number .." should be lowercase for consistency with other messages.

@@ -42,7 +42,7 @@ function sparse_check_Ti(m::Integer, n::Integer, Ti::Type)
n < 0 && throwsz("columns", 'n', n)
!isbitstype(Ti) || m ≤ typemax(Ti) || throwTi("number of rows", "m", m)
!isbitstype(Ti) || n ≤ typemax(Ti) || throwTi("number of columns", "n", n)
!isbitstype(Ti) || n*m+1 ≤ typemax(Ti) || throwTi("maximal nnz+1", "m*n+1", n*m+1)
!isbitstype(Ti) || nnzval+1 ≤ typemax(Ti) || throwTi("Number of nonzeros", "nnz+1", nnzval+1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!isbitstype(Ti) || nnzval+1 typemax(Ti) || throwTi("Number of nonzeros", "nnz+1", nnzval+1)
!isbitstype(Ti) || nnzval < typemax(Ti) || throwTi("number of nonzeros", "nnz+1", nnzval+1)

@KlausC
Copy link
Contributor

KlausC commented Apr 10, 2019

As nnz(A) may change during the lifetime of A, it would be wise to assert that the value we want to store in A.colptr[A.n+1] (that is the correct new nnz(A)+1) fits in a Ti.

I agree that the proposed check of length(A.nzval) covers also the case, #31024, which was fixed by #31118. But why not check also length(A.rowval)? The idea behind #31118 was to check only immutable values in the constructor.

I would be happy, if we could assure full consistency of the data during the whole lifetime of A. That includes:

  1. 0 < A.colptr[k] <= A.colptr[k+1] for k = 1:A.n (that would cover Bound violation in SparseArrays - segfault #31024 also)
  2. min(length(A.rowval), length(A.nzval)) >= nnz(A)

@@ -23,7 +23,7 @@ struct SparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
function SparseMatrixCSC{Tv,Ti}(m::Integer, n::Integer, colptr::Vector{Ti}, rowval::Vector{Ti},
nzval::Vector{Tv}) where {Tv,Ti<:Integer}

sparse_check_Ti(m, n, Ti)
sparse_check_Ti(m, n, length(nzval), Ti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sparse_check_Ti(m, n, length(nzval), Ti)
sparse_check_Ti(m, n, max(length(nzval), length(rowval)), Ti)

@@ -594,7 +594,7 @@ function sparse!(I::AbstractVector{Ti}, J::AbstractVector{Ti},
csccolptr::Vector{Ti}, cscrowval::Vector{Ti}, cscnzval::Vector{Tv}) where {Tv,Ti<:Integer}

require_one_based_indexing(I, J, V)
sparse_check_Ti(m, n, Ti)
sparse_check_Ti(m, n, length(V), Ti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sparse_check_Ti(m, n, length(V), Ti)
sparse_check_Ti(m, n, length(I), Ti)

@ViralBShah
Copy link
Member

Can't setindex! potentially grow the size of nzval and rowval, thus requiring the check to be the way it is right now (but being careful about the overflow)?

@mbauman
Copy link
Member

mbauman commented Apr 10, 2019

Yes, but:

So these sorts of "it'd be nice to guarantee"s are non-sequitors. We just need to get things back to the old status quo and then we can discuss how to approach the next steps.

It's either merge this patch or wholesale revert #31118.

@KristofferC
Copy link
Member

The discussion here shows that there are intricacies we don't really have time to deal with now, considering we want to branch for 1.3 about now. I suggest just reverting the breaking PR (#31667) and then the discussion can take as long as it wants to add it back in a non-breaking way.

@KlausC
Copy link
Contributor

KlausC commented Apr 10, 2019

Sorry, but without a replacement for #31118 the software is breaking because of #31024. Propose to merge this PR with the proposed changes.

@KristofferC
Copy link
Member

the software is breaking because of

What code would reverting the PR break? I see an error being replaced with another one.

@mbauman
Copy link
Member

mbauman commented Apr 10, 2019

Note that we're using breaking with a very specific meaning here: it's not the Julia is "broken" (e.g., with a seg fault in some instances) but rather that a particular change caused a regression that broke someone's workflow that previously worked. That's significantly worse than an existing bug that has been a bug for a long time.

@Pbellive
Copy link
Contributor Author

I'm in favour of @KristofferC 's suggestion to revert #31118 and continue discussion of a long term fix here. @ViralBShah and @KlausC 's comments that setindex! can grow the internal arrays are of course important points. To address that, it seems like it would be sufficient to just add a type check inside the setindex! methods to make sure that A.colptr[n+1] < typemax(Ti) remains true after the new entries are added to the matrix. I haven't tested that but I wouldn't think it would be much of a performance hit.

@mauro3
Copy link
Contributor

mauro3 commented Apr 10, 2019

Sorry, but without a replacement for #31118 the software is breaking because of #31024

@KlausC: until it is fixed you can monkey-patch it: https://discourse.julialang.org/t/difficulties-with-recent-julia-releases/22969/2)

@ararslan ararslan removed this from the 1.2 milestone Apr 10, 2019
@KlausC
Copy link
Contributor

KlausC commented Apr 11, 2019

@mbauman

Note that we're using breaking with a very specific meaning here: ... a particular change caused a regression that broke someone's workflow that previously worked

Does that mean, we don't introduce data consistency checks, because someone's program worked, although the data, it operated on, were partially corrupted?

In our case, I agree, that the introduced check was too restrictive, but generally ...?

@mbauman
Copy link
Member

mbauman commented Apr 11, 2019

It just means that we need to be careful about doing so. We flag all such changes as minor changes, we ensure they appear in NEWS.md, and such changes have a higher bar to cross to get merged.

In cases where the previous behavior resulted in corrupted or buggy answers, then that's not breaking — it's a bug. Admittedly sometimes the line between the two can be blurry.

In this case, I still really want to see a resurrected #31118+#31661 that fixes the segfault and protects against overfilling such arrays. Had we not been right on a branching threshold, we probably would have let things percolate a little longer without a direct revert.

@ViralBShah
Copy link
Member

ViralBShah commented Apr 11, 2019

We always want correctness, but this was a tricky situation as already discussed above. Reverting and redoing is certainly the right thing.

@Pbellive
Copy link
Contributor Author

I'm happy to update and expand this PR to make an attempt at addressing the concerns raised in this thread. I should be able to get to it in the next 4-5 days.

@KlausC
Copy link
Contributor

KlausC commented Apr 14, 2019

@Pbellive sorry for taking your job. I hope you can live with this solution.

@ViralBShah ViralBShah closed this Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseMatrixCSC type check is too restrictive (regression)
8 participants