sprand and SparseMatrixCSC constructor simplifications #30617

abraunst · 2019-01-06T15:10:47Z

This PR implements some simplifications to the sparse code made very easy by #30494. In there, sparse random matrices were effectively constructed from a vector of linear indices (which for sprand were taken from randsubseq). So I factored out the sparse matrix construction part from sprand that can be used now both in sprand and in the fallback sparse constructor from AbstractMatrix (which does not catch e.g. Matrix because there is a more specialized constructor from StridedMatrix). I tested that there is no performance regression by checking against Traspose. In fact there seems to be a (small) performance gain as (a) there is no need to check for duplicates and (b) using linear indices one avoids the allocation of an extra vector of indices. Is there a way to check for performance regressions more systematically?
(net outcome from this PR is -60 lines)

for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)
    @show(n,p)
    Random.seed!(1); x=Transpose(Matrix(sprand(n,n,p))); @show @benchmark sparse($x)
end

MASTER:

n\p	1.0	0.1	0.01	0.001	0.0001
n=10	1.481 μs	517.677 ns	379.346 ns	350.789 ns	353.089 ns
n=100	121.878 μs	37.005 μs	20.916 μs	18.598 μs	18.476 μs
n=1000	20.618 ms	6.356 ms	3.295 ms	2.892 ms	2.936 ms

PR:

n\p	1.0	0.1	0.01	0.001	0.0001
n=10	1.357 μs	445.061 ns	332.215 ns	305.885 ns	307.341 ns
n=100	113.133 μs	37.731 μs	20.636 μs	18.667 μs	18.342 μs
n=1000	18.671 ms	6.013 ms	3.290 ms	2.924 ms	2.939 ms

stdlib/SparseArrays/src/sparsematrix.jl

ViralBShah · 2019-01-06T18:52:26Z

I don't believe that the nanosoldier perf benchmarks have anything relevant here.

stdlib/SparseArrays/src/sparsematrix.jl

abraunst · 2019-01-07T08:19:17Z

@ViralBShah I am ok with the title change, but note that this is not exclusively about sprand. It introduces a contructor of SparseMatrixCSC from a linear increasing set of indices, and uses that on sprand and sparse.

ViralBShah · 2019-01-07T12:26:32Z

I did mean to update the title to reflect that. Will do shortly.

stdlib/SparseArrays/src/sparsematrix.jl

stevengj · 2019-01-07T22:32:02Z

With this new code, can we get rid of the specialized StridedMatrix constructor, or does the latter still have a substantial performance advantage?

abraunst · 2019-01-07T23:33:20Z

With this new code, can we get rid of the specialized StridedMatrix constructor, or does the latter still have a substantial performance advantage?

Unfortunately it is still faster:

julia> using SparseArrays, BenchmarkTools

julia> x=Matrix(sprand(1000,1000,0.1));

master:

julia> @benchmark sparse($x)
BenchmarkTools.Trial: 
  memory estimate:  1.53 MiB
  allocs estimate:  6
  --------------
  minimum time:     1.534 ms (0.00% GC)
  median time:      1.587 ms (0.00% GC)
  mean time:        1.613 ms (1.24% GC)
  maximum time:     27.612 ms (94.07% GC)
  --------------
  samples:          3091
  evals/sample:     1

PR:

julia> @benchmark sparse($x)
BenchmarkTools.Trial: 
  memory estimate:  4.01 MiB
  allocs estimate:  36
  --------------
  minimum time:     2.333 ms (0.00% GC)
  median time:      2.386 ms (0.00% GC)
  mean time:        2.454 ms (2.57% GC)
  maximum time:     31.432 ms (92.45% GC)
  --------------
  samples:          2036
  evals/sample:     1

Noting the extra allocation, I tried the following:

function SparseMatrixCSC{Tv,Ti}(M::AbstractMatrix) where {Tv,Ti}
    @assert !has_offset_axes(M)
    nz = count(!iszero, M)
    I = Vector{Ti}(undef, nz)
    V = Vector{Tv}(undef, nz)
    i, j = 0, 0

    for v in M
        i += 1
        if !iszero(v)
            I[j += 1] = i
            V[j] = v
        end
    end
    return sparse_sortedlinearindices!(I, V, size(M)...)
end

And we get much closer! Will try to benchmark better tomorrow.

julia> @benchmark sparse($x)
BenchmarkTools.Trial: 
  memory estimate:  1.54 MiB
  allocs estimate:  6
  --------------
  minimum time:     1.615 ms (0.00% GC)
  median time:      1.664 ms (0.00% GC)
  mean time:        1.703 ms (1.72% GC)
  maximum time:     30.738 ms (94.42% GC)
  --------------
  samples:          2930
  evals/sample:     1

stevengj · 2019-01-08T00:24:39Z

This is probably fast enough to get rid of the specialized Matrix method. Performance-critical code will probably never use these methods anyway—you always make large sparse matrices directly, not from dense matrices. These conversions are just here for convenience for small interactive problems, I think.

abraunst · 2019-01-08T07:39:03Z

This is probably fast enough to get rid of the specialized Matrix method. Performance-critical code will probably never use these methods anyway—you always make large sparse matrices directly, not from dense matrices. These conversions are just here for convenience for small interactive problems, I think.

The situation is not clear-cut. Maybe it would be wiser to leave the PR and leave the M::StridedArrays constructor (i.e. PR as is)?

The above approach is faster for very dense matrices but substantially slower in sparser ones in the Transpose test:

master (AbstractArray constructor):

julia> [ begin Random.seed!(1); x=Transpose(Matrix(sprand(n,n,p)));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
   1461.6          486.513        366.928        348.009        347.897    
 121178.0        38327.0        20727.0        18619.0        18414.0      
      2.09966e7      6.31002e6      3.36006e6      2.96888e6      2.94113e6

above proposal:

julia> [ begin Random.seed!(1); x=Transpose(Matrix(sprand(n,n,p)));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
   470.23         393.356        392.965        382.365        381.788    
 38097.0        40642.0        32357.0        31666.0        31449.0      
     6.36208e6      6.15465e6      5.11586e6      5.11357e6      5.00486e6

PR

julia> [ begin Random.seed!(1); x=Transpose(Matrix(sprand(n,n,p)));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
   1262.9          386.683        273.415        236.594        235.64     
 113408.0        38788.0        19654.0        17685.0        17404.0      
      1.70226e7      5.28324e6      3.07331e6      2.79147e6      2.78202e6

in the Matrix test instead it is the opposite:

master (M::StridedArray constructor)

julia> [ begin Random.seed!(1); x=Matrix(sprand(n,n,p));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
   188.217        133.582        128.394     138.25     138.061
 10362.0        11663.0         7455.0      6531.4     6473.6  
     1.68884e6      1.5584e6  799494.0    721780.0   705629.0

above proposal:

julia> [ begin Random.seed!(1); x=Matrix(sprand(n,n,p));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
   246.682        139.053        131.625     132.598     130.24
 15989.0        13112.0         6817.2      5970.83     5806.5 
     2.29853e6      1.6286e6  799516.0    714622.0    696401.0

PR

julia> [ begin Random.seed!(1); x=Matrix(sprand(n,n,p));  b=@benchmark sparse($x); minimum(b.times); end for n=(10,100,1000), p=(1.0,1e-1,1e-2,1e-3,1e-4)]
3×5 Array{Float64,2}:
  1112.3          271.748         155.67     120.776     120.347
 95653.0        22313.0          7301.25    5423.17     5190.67 
     9.82539e6      2.33503e6  729518.0   540351.0    520473.0

abraunst · 2019-01-08T07:41:16Z

The failed check is a timeout:

The job exceeded the maximum time limit for jobs, and has been terminated.

abraunst · 2019-01-08T09:37:21Z

BTW how would people feel about exporting sparse(I::Vector{Int}, V::Vector, m::Int, n::Int) = sparse_linearindices!(copy(I), V, m, n) ?

EDIT: added full qualifications and fixed a mistake

ViralBShah · 2019-01-08T19:13:45Z

I've restarted the failing build, but shouldn't really matter since we have enough greens.

abraunst · 2019-01-09T22:55:47Z

The situation is not clear-cut. Maybe it would be wiser to leave the PR and leave the M::StridedArrays constructor (i.e. PR as is)?

Just to recap: with the PR as-is, we get "PR" for the first set, and "master" for the second set (because we would keep the existent constructor from StridedArray). We can choose any combination of the two essentially, but this choice is the only one that is uniformly better than master (in these tests). As @stevengj said, this is probably not performance-critical code in any case.

Besides the simplification and performance improvement, the PR fixes a bug in #30627 (type instability and wrong type return for non 0x0 matrices). Let me know how to proceed!

EDIT: rebased.

* Simpler and faster sparse * remove sparse_IJ_sorted!

KlausC · 2019-01-15T11:52:07Z

These conversions are just here for convenience for small interactive problems, I think.

The SparseMatrixCSC(M::AbstractMatrix) constructors are also used as the "abstractarray fallback", when you transform for example Symmetric{Tv,SparseMatrixCSC}. There is a PR 30552 under review, which improves the perfomance for those cases essentially.

abraunst · 2019-01-19T11:05:01Z

bump?

ViralBShah · 2019-01-19T13:40:23Z

@stevengj Are you ok with this PR?

KristofferC reviewed Jan 6, 2019

View reviewed changes