Argument checks for SparseMatrixCSC constructors #31724

KlausC · 2019-04-14T20:42:44Z

This PR combines the reverted breaking #31118 and the unfinished #31661.

In order to allow n * m >= typemax(Ti) it is necessary to put the following restrictions in place is order to avoid segmentation violations.

m, n <= typemax(Ti)
length(colptr) >= n+1
1 <= colptr[i] <= colptr[i+1] for i in 1:n
in addition to the preexisting checks.
All uses of SparseMatrixCSC(m, n, colptr, ....) with uninitialized colptr have been modified.

Fixes #31024.

Pbellive · 2019-04-15T16:34:57Z

Thanks for getting this done @KlausC! This solution looks good to me.

stdlib/SparseArrays/src/sparsematrix.jl

KlausC · 2019-04-15T17:40:14Z

I wanted to add the following text into NEWS.md:

* `SparseMatrixCSC(m,n,colptr,rowval,nzval)` and `sparse(I, J, V)` perform consistency checks
  for some arguments. `colptr` must be properly populated and lengths of `colptr`, `rowval`,
  and `nzval` must be compatible with `m`, `n`, and `eltype(colptr)`.

Also: # add tests for length(I) > typemax(Ti) when n*m < typemax(Ti)

mbauman · 2019-04-15T17:47:00Z

Go for it! You can just tag the PR as WIP in the title to prevent someone from prematurely merging it before you're done.

…nto krc/sparsecheckTi

KlausC · 2019-04-25T08:30:49Z

bumpi

StefanKarpinski · 2019-04-25T13:44:33Z

Both failures seem unrelated (Win32: FileWatching; macOS: can't reach github.com).

StefanKarpinski · 2019-04-25T13:45:05Z

@mbauman or @KristofferC, please merge if this looks good to you.

mbauman · 2019-04-25T14:11:17Z

Edit: whoops, wrong comment on the wrong PR. Sorry about that.

KlausC · 2019-05-02T16:41:34Z

bump often!

KlausC · 2019-05-14T10:15:27Z

bump :-)

jebej · 2019-05-14T14:36:49Z

How much slower does this make the matrix construction?

jebej · 2019-05-14T14:58:18Z

In one particular use case for me, this makes the matrix construction 7x slower. If we want to add this, please make it an outer constructor. Unless I missed something about the original issue, I don't think this is desirable. People who use the direct constructor should know what they are doing.

jebej · 2019-05-14T15:27:55Z

If I understand properly, it seems that the real issue originally is due to the sparse constructor giving invalid arguments to SparseMatrixCSC. If that's the case, could we fix the issue there instead and leave SparseMatrixCSC alone?

KlausC · 2019-05-23T19:29:25Z

@jebej, I try to understand your argument. Actually I do not agree with

People who use the direct constructor should know what they are doing

I think, everybody in all situations should know what he is doing, but the 'should' is only a weak warranty. There is also no hint in the documentation like 'if you use this constructor, you should guarantee the consistency of all arguments, otherwise a segfault might occur'.

I am in favor of input argument checks, if they do not add inappropriate effort.
The checks added in this PR add approximately n integer comparisons, where n is the number of columns of the matrix. The initial setting of the vector colptr with valid entries requires typically n write operations, which could be at least as expensive. Taking this in account, I would expect in the worst case only doubling of the construction time. I would appreciate to see your '7x' use case.

Of course it would be possible to move the check from the constructor to sparse and all other uses of the constructor within the standard libraries. I think, it is a design decision.

KlausC · 2019-05-29T21:36:39Z

bump :-)

ViralBShah · 2019-06-17T02:56:57Z

@jebej Any thoughts here?

jebej · 2019-06-17T14:31:15Z

I think what I had said earlier still applies, the original issue is with the sparse function giving bad inputs to the constructor SparseMatrixCSC, and is not with the constructor itself.

I would consider the SparseMatrixCSC constructor an "experts-only" constructor, given that you need to understand how to set the column pointers properly. It would be sad to have that constructor make all these extraneous (again, for "experts") checks without a way to turn them off. If we still want to add checks to this constructor, there should be a way to bypass them.

Regarding the test I made, it was the creation of an annihilation operator:

function destroy_checked(::Type{T}, N::Integer) where {T<:Number}
    rowval = Vector{Int}(undef,N-1); for i=1:N-1; @inbounds rowval[i]=i; end
    colptr = Vector{Int}(undef,N+1); colptr[1]=1; for i=2:N+1; @inbounds colptr[i]=i-1; end
    nzval  = Vector{T}(undef,N-1); for i=1:N-1; @inbounds nzval[i]=√(T(i)); end
    return SparseMatrixCSC_checked(N,N,colptr,rowval,nzval)
end

which gives, for the checked and unchecked cases:

julia> @benchmark destroy_checked(Float64,10)
BenchmarkTools.Trial:
  memory estimate:  560 bytes
  allocs estimate:  5
  --------------
  minimum time:     791.657 ns (0.00% GC)
  median time:      854.485 ns (0.00% GC)
  mean time:        988.225 ns (8.41% GC)
  maximum time:     490.635 μs (99.70% GC)
  --------------
  samples:          10000
  evals/sample:     99

julia> @benchmark destroy(Float64,10)
BenchmarkTools.Trial:
  memory estimate:  544 bytes
  allocs estimate:  4
  --------------
  minimum time:     148.063 ns (0.00% GC)
  median time:      165.921 ns (0.00% GC)
  mean time:        199.120 ns (12.07% GC)
  maximum time:     59.564 μs (99.27% GC)
  --------------
  samples:          10000
  evals/sample:     836

So more like 5x.

jebej · 2019-06-17T14:40:45Z

I would of course support adding a blurb to the documentation saying that the constructor does not check inputs, and to use sparse if input checking is needed.

KlausC · 2019-06-18T17:33:44Z

I improved the PR in to give attention to the above objections:

Performance of checks was improved. The time penalty is now 15 + 0.7 * N Nanoseconds, where N is the array size in the example.
The constructor SparseMatrixCSC{Tv,Ti}(m, n, ...) is the unchecked "expert" version now. It may be used, if checks are not wanted. The constructor SparseMatrixCSC(m, n, ...) has the additional features.

KlausC · 2019-06-18T17:34:05Z

Benchmarks:

julia> function destroy(::Type{T}, ::Type{Ti}, N::Integer, check=true) where {T<:Number,Ti<:Integer}
           rowval = Vector{Ti}(undef,N-1); for i=1:N-1; @inbounds rowval[i]=i; end
           colptr = Vector{Ti}(undef,N+1); colptr[1]=1; for i=2:N+1; @inbounds colptr[i]=i-1; end
           nzval  = Vector{T}(undef,N-1); for i=1:N-1; @inbounds nzval[i]=√(T(i)); end
           check ? SparseMatrixCSC(N,N,colptr,rowval,nzval) : SparseMatrixCSC{T,Ti}(N,N,colptr,rowval,nzval)
       end
julia> function benches(::Type{Ti}) where Ti
           bf = []; bt = []
           for n in 0:6
               push!(bf, @benchmark destroy(Float64, $Ti, $(10^n), false))
               push!(bt, @benchmark destroy(Float64, $Ti, $(10^n), true))
           end
           bf, bt
       end
benches (generic function with 1 method)

julia> bf, bt = benches(Int);             

julia> bt[2]
BenchmarkTools.Trial: 
  memory estimate:  560 bytes
  allocs estimate:  5
  --------------
  minimum time:     151.764 ns (0.00% GC)
  median time:      159.621 ns (0.00% GC)
  mean time:        252.140 ns (34.91% GC)
  maximum time:     4.746 μs (95.96% GC)
  --------------
  samples:          10000
  evals/sample:     832

julia> bf[2]
BenchmarkTools.Trial: 
  memory estimate:  544 bytes
  allocs estimate:  4
  --------------
  minimum time:     128.658 ns (0.00% GC)
  median time:      137.627 ns (0.00% GC)
  mean time:        223.338 ns (36.60% GC)
  maximum time:     4.196 μs (96.33% GC)
  --------------
  samples:          10000
  evals/sample:     898


julia> times(f, b) = getfield.(f.(b), :time)
times (generic function with 1 method)

julia> timediff(f, bf, bt) = times(f, bt) .- times(f, bf)
timediff (generic function with 1 method)

julia> timequot(f, bf, bt) = times(f, bf) ./ times(f, bt)
timequot (generic function with 1 method)

# median time differences in Nanoseconds:
julia> b = timediff(median, bf, bt)
7-element Array{Float64,1}:
     14.459207056651152
     21.993844494175107
     75.44275081162402 
    654.5              
   6104.5              
  61135.0              
 640836.0              

julia> A = [ones(7) [1,10,100,1000,10^4,10^5,10^6]]
julia> D = inv(diagm([1,10,100,1000,10^4,10^5,10^6]));

# linear fitting with weigths:
julia> (A'D*A) \ (A'D*b)
2-element Array{Float64,1}:
 13.918559381172859 
  0.6378700830850331
# the time values of b are regenerated with good accuracy:
julia> A * ((A'D*A) \ (A'D*b))
7-element Array{Float64,1}:
     14.556429464257892
     20.29726021202319 
     77.70556768967617 
    651.7886424662059  
   6392.6193902315035  
  63800.92686788448    
 637884.0016444143 

# time increase is between 7 and 14 %: 
julia> timequot(median, bt, bf)
7-element Array{Float64,1}:
 1.1403098489801728
 1.159807688028621 
 1.1149221696978133
 1.0931805239179955
 1.0946441445282522
 1.095640148368556 
 1.0756124851287872

KlausC · 2019-06-19T08:22:02Z

@jebej, do you still see room for improvements?

KristofferC · 2019-06-19T08:53:01Z

Personally, I don't think these things should be checked at construction time since there are cases where you intermediately might want to have an invalid sparse matrix.
It also seems odd to me to have SparseMatrixCSC{} do no checking but SparseMatrixCSC do checking. These feel like they should do the same thing with the exception that in the first case you get the types you explicitly specify. In addition, the SparseMatrixCSC(...) construtor is from what I can see not even documented (https://docs.julialang.org/en/v1/stdlib/SparseArrays/index.html#Sparse-Vector-and-Matrix-Constructors-1) so adding too much logic to it seems not very useful.

I would personally prefer something like #22529 (which is similar to https://docs.scipy.org/doc/scipy-1.2.0/reference/generated/scipy.sparse.csr_matrix.check_format.html#scipy.sparse.csr_matrix.check_format) where you can do a check to see that the matrix is valid before you start to use it. This also allows you to check that the matrix is valid after performing mutating functions on the fields, and not only at construction time.

KlausC · 2019-06-19T11:41:21Z

Personally, I don't think these things should be checked at construction time since there are cases where you intermediately might want to have an invalid sparse matrix.

That is contrary to my opinion: I think, every exported constructor, if documented or not, should return a valid object of that type.
There should be only few exceptions to this rule, and the consequences of leaking an invalid object to general usage should be (invalid state) exceptions, but neither seg-faults nor silent failure.
In the case of SparseMatrixCSC there are @inbounds statements and silent assumptions about the correctness of the data at many places in the SparseArrays package. Application of those functions to invalid matrices would in the worst case silently produce wrong results.

since there are cases where you intermediately might want to have an invalid sparse matrix

in such cases it is easy to postpone construction until integrity of components has been established. At least in the julia base and stdlib that could be easily achieved.

seems odd to me to have SparseMatrixCSC{} do no checking but SparseMatrixCSC do checking

That was just a proposal to provide an unchecked version of the constructor besides the checked version, as was suggested by @jebej.

not even documented

If it is exported it should be documented.

... so adding too much logic to it seems not very useful

The logic added to the constructor is:

check m, n <= typemax(Ti). (even better the field types of m, n would be Ti). That is not more logic added than the current m,n > 0.
check of monotony of colptr is more effort, but not much compared to building the component arrays
resize vectors to maximal potential valid required lengths is just reasonable and typically a nop
I don't think that is much added logic.

also allows you to check that the matrix is valid after performing mutating functions on the fields, and not only at construction time

Adding a (exported?) function to perform a validity check is a good idea and should share code with the checks of this PR.

KristofferC · 2019-06-19T13:32:47Z

If it is exported it should be documented.

The type SparseMatrixCSC is documented. The constructor SparseMatrixCSC(...) is not.

KlausC · 2019-06-19T13:42:34Z

That looks like missing documentation, not like "don't use the constructor". What I mean is, if type SparseMatrixCSC exported and its constructors SparseMatrixCSC(....) are not supported/fragile/not-to-be-used by "non-experts", there should be a hint to that, because it is unexpected.

KlausC · 2019-06-19T14:06:42Z

To be constructive, I want to outline a way to save the essence of this PR without touching the constructors.

revert all constructors to the previous state.
collect all additional checks of this PR into a separate function checkvalid, which also integrates all implement a checkvalid function for sparse matrices and use it in show #22529 checks.
~~create a new method sparse(m, n, colptr, rowval, nzval) which is documented as an alternative way of creating a valid SparseMatrixCSC and with the features of this commit d545e34~~
use validation within SparseArrays in in addition to the constructor, where checking seems useful; that includes sparse(I, J, V, ...).

ViralBShah · 2019-06-21T16:41:32Z

The challenge here comes from the fact that SparseMatrixCSC was not originally designed to be user facing. It was imagined that if anyone used it, they would need to fully take ownership of getting it right. I think it would be fine to implement checks 1 and 2 in the constructor since those are cheap and won't hurt anything. The colptr checks are best not done, because people expect the constructors to be cheap (historically) and this would surprise them.

I am not convinced about extending the sparse method signature to add the low-level matrix constructor. For now, let's do the checkvalid thing and get this one merged. We can plan other convenient ways of constructing validated CSC structures separately. Could even be in a package.

I feel that this whole thing needs to move out of stdlib so we can iterate faster and do more, but that needs stdlib versioning and a bunch more stuff.

KlausC · 2019-06-21T19:41:28Z

I adapted my outline according to your proposals.

ViralBShah

The rest of the PR looks good to me. So hopefully we can merge this soon.

ViralBShah · 2019-06-26T19:40:47Z

I am willing to give this PR a shot, with the SparseMatrixCSC{Tv,Ti} as the expert version and SparseMatrixCSC checking the colptr and a few more things.

Just a note - the checking of the colptr especially in SparseMatrixCSC is an issue after someone has written all over it - so you want to check it at that point before it is handed over to other code.

KristofferC · 2019-08-26T16:34:25Z

stdlib/SparseArrays/src/sparsematrix.jl

-
+    Tj = Ti
+    while isbitstype(Tj) && coolen >= typemax(Tj)
+        Tj = widen(Tj)


This made Tj type unstable and caused the regression in #32985. Only matters for smallish arrays but would be good to fix nonetheless. Can't we just always use Int for this or something?

Ideally this would just be an error. This segfaulted in 1.2. Let's just make it an error instead.

KristofferC · 2019-08-26T16:42:04Z

The error messages here leave some to be desired. For example

julia> SparseMatrixCSC(5, 1, Int8[1,2], fill(Int8(1),127), Int[1,2,3])
ERROR: ArgumentError: 127 == length(rowval) >= 127

gives very little information about what went wrong and how it can be fixed.

KlausC added 3 commits April 11, 2019 12:25

reconstruct PR #31118

892cded

reconstruct PR 31118 2

802ada9

Check arguments of SparseMatrixCSC #31024 #31435

cfc2470

ViralBShah requested a review from mbauman April 14, 2019 21:43

ViralBShah added the sparse Sparse arrays label Apr 14, 2019

mbauman requested changes Apr 15, 2019

View reviewed changes

stdlib/SparseArrays/src/sparsematrix.jl Outdated Show resolved Hide resolved

fix SuiteSparse test

dd64f7f

mbauman approved these changes Apr 15, 2019

View reviewed changes

KlausC changed the title ~~Argument checks for SparseMatrixCSC constructors~~ WIP - Argument checks for SparseMatrixCSC constructors Apr 15, 2019

KlausC and others added 2 commits April 15, 2019 22:11

added NEWS, fixed tests

7abf9c4

Merge branch 'master' into krc/sparsecheckTi

2243bcf

KlausC changed the title ~~WIP - Argument checks for SparseMatrixCSC constructors~~ Argument checks for SparseMatrixCSC constructors Apr 15, 2019

KlausC added 3 commits April 17, 2019 16:42

loosen restrictions - resize to useful length

5ae6492

Merge branch 'krc/sparsecheckTi' of https://github.com/KlausC/julia i…

52072d2

…nto krc/sparsecheckTi

cleaned up NEWS, revert minor change

c5fc16e

KlausC closed this Apr 25, 2019

KlausC reopened this Apr 25, 2019

KlausC and others added 2 commits June 18, 2019 18:44

add non-checking and checking constructor - improve check performance

d545e34

Merge branch 'master' into krc/sparsecheckTi

2412532

ViralBShah requested changes Jun 26, 2019

View reviewed changes

ViralBShah merged commit b32c1b8 into JuliaLang:master Jun 26, 2019

KristofferC mentioned this pull request Jul 10, 2019

Inconsistent errors with SparseArray construction (new in 1.2) #32548

Closed

This was referenced Jul 18, 2019

sparse array of BigInts #32539

Closed

Avoid typemax checks for non-standard integers #32631

Merged

JeffBezanson mentioned this pull request Aug 1, 2019

SparseMatrixCSC type check is too restrictive (regression) #31435

Closed

This was referenced Aug 20, 2019

remove trim option to fkeep! and related functions #32972

Merged

Benchmark regressions of 1.3 vs 1.2 #32985

Closed

tkf mentioned this pull request Aug 24, 2019

RFC: AbstractSparseMatrixCSC interface #33054

Closed

4 tasks

KristofferC reviewed Aug 26, 2019

View reviewed changes

KristofferC mentioned this pull request Feb 2, 2020

Aliasing problem with SparseMatrixCSC and SparseVector #34630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argument checks for SparseMatrixCSC constructors #31724

Argument checks for SparseMatrixCSC constructors #31724

KlausC commented Apr 14, 2019 •

edited by mbauman

Loading

Pbellive commented Apr 15, 2019

KlausC commented Apr 15, 2019 •

edited

Loading

mbauman commented Apr 15, 2019

KlausC commented Apr 25, 2019

StefanKarpinski commented Apr 25, 2019

StefanKarpinski commented Apr 25, 2019

mbauman commented Apr 25, 2019 •

edited

Loading

KlausC commented May 2, 2019

KlausC commented May 14, 2019

jebej commented May 14, 2019

jebej commented May 14, 2019

jebej commented May 14, 2019

KlausC commented May 23, 2019

KlausC commented May 29, 2019

ViralBShah commented Jun 17, 2019

jebej commented Jun 17, 2019

jebej commented Jun 17, 2019

KlausC commented Jun 18, 2019 •

edited

Loading

KlausC commented Jun 18, 2019

KlausC commented Jun 19, 2019 •

edited

Loading

KristofferC commented Jun 19, 2019

KlausC commented Jun 19, 2019

KristofferC commented Jun 19, 2019

KlausC commented Jun 19, 2019

KlausC commented Jun 19, 2019 •

edited

Loading

ViralBShah commented Jun 21, 2019

KlausC commented Jun 21, 2019

ViralBShah left a comment

ViralBShah commented Jun 26, 2019 •

edited

Loading

KristofferC Aug 26, 2019

mbauman Aug 26, 2019

mbauman Aug 26, 2019

KristofferC commented Aug 26, 2019

Argument checks for SparseMatrixCSC constructors #31724

Argument checks for SparseMatrixCSC constructors #31724

Conversation

KlausC commented Apr 14, 2019 • edited by mbauman Loading

Pbellive commented Apr 15, 2019

KlausC commented Apr 15, 2019 • edited Loading

mbauman commented Apr 15, 2019

KlausC commented Apr 25, 2019

StefanKarpinski commented Apr 25, 2019

StefanKarpinski commented Apr 25, 2019

mbauman commented Apr 25, 2019 • edited Loading

KlausC commented May 2, 2019

KlausC commented May 14, 2019

jebej commented May 14, 2019

jebej commented May 14, 2019

jebej commented May 14, 2019

KlausC commented May 23, 2019

KlausC commented May 29, 2019

ViralBShah commented Jun 17, 2019

jebej commented Jun 17, 2019

jebej commented Jun 17, 2019

KlausC commented Jun 18, 2019 • edited Loading

KlausC commented Jun 18, 2019

KlausC commented Jun 19, 2019 • edited Loading

KristofferC commented Jun 19, 2019

KlausC commented Jun 19, 2019

KristofferC commented Jun 19, 2019

KlausC commented Jun 19, 2019

KlausC commented Jun 19, 2019 • edited Loading

ViralBShah commented Jun 21, 2019

KlausC commented Jun 21, 2019

ViralBShah left a comment

Choose a reason for hiding this comment

ViralBShah commented Jun 26, 2019 • edited Loading

KristofferC Aug 26, 2019

Choose a reason for hiding this comment

mbauman Aug 26, 2019

Choose a reason for hiding this comment

mbauman Aug 26, 2019

Choose a reason for hiding this comment

KristofferC commented Aug 26, 2019

KlausC commented Apr 14, 2019 •

edited by mbauman

Loading

KlausC commented Apr 15, 2019 •

edited

Loading

mbauman commented Apr 25, 2019 •

edited

Loading

KlausC commented Jun 18, 2019 •

edited

Loading

KlausC commented Jun 19, 2019 •

edited

Loading

KlausC commented Jun 19, 2019 •

edited

Loading

ViralBShah commented Jun 26, 2019 •

edited

Loading