rename SparseMatrixCSC to SparseMatrix #25151

fredrikekre · 2017-12-17T23:43:28Z

Makes the constructor much friendlier to type (SparseMatrix(I, 3, 3) etc...)

CC @ViralBShah and @Sacha0 who previously discussed this.

oscardssmith · 2017-12-18T00:03:20Z

Given how many different types of sparse matrices there are, and how different their performance characteristics are, I think it probably makes more sense to specify and allow people to alias if they want to.

Sacha0

Fantastic! Thanks @fredrikekre! Modulo CI approval, lgtm :).

Sacha0 · 2017-12-18T00:17:34Z

Given how many different types of sparse matrices there are, and how different their performance characteristics are, I think it probably makes more sense to specify and allow people to alias if they want to.

The same could be said of Dict, Set, Array, and so on; should Dict, Set, Array, etc have qualifications added to their names? SparseArrays implements one and only one sparse matrix format, and that de facto standard format having a nice name does not preclude implementation and use of other formats in SparseArrays or elsewhere. Best!

StefanKarpinski · 2017-12-18T00:46:03Z

Hmm... I'm not sure how I feel about this. Both sides make sense to me.

Sacha0 · 2017-12-18T00:48:16Z

Re. @fredrikekre's opening comment, for what it's worth both Viral and I were quite fond of this rename as heavy users of SparseArrays. Best!

StefanKarpinski · 2017-12-18T01:25:09Z

I can see that. How about leaving SparseMatrixCSC as an alias for SparseMatrix so that people using SparseMatrixCSC in addition to e.g. SparseMatrixCSR can use the full names. Otherwise the lack of symmetry is a little disconcerting.

Sacha0 · 2017-12-18T01:28:36Z

Sounds good on this end :). (Though the partial Julia CSR implementation I am aware of has been unmaintained for a few years.)

StefanKarpinski · 2017-12-18T01:30:10Z

Right, but obviously we'd like to have a complete CSR implementation at some point :)

Sacha0 · 2017-12-18T01:31:02Z

Right, but obviously we'd like to have a complete CSR implementation at some point :)

You touch on my secret plans... 😎

tkelman · 2017-12-18T12:45:35Z

Having other formats implemented here or elsewhere would however make code start looking very confusing if only CSC gets the privilege of not being mentioned by name. CSC does not actually work the way many people naively assume sparse matrices to behave, many APIs give people the impression COO is how the storage should work. Dropping the disambiguation doesn't seem worth saving 3 characters.

Sacha0 · 2017-12-18T15:14:41Z

Having other formats implemented here or elsewhere would however make code start looking very confusing if only CSC gets the privilege of not being mentioned by name. CSC does not actually work the way many people naively assume sparse matrices to behave, many APIs give people the impression COO is how the storage should work.

Again the same could be said of Dict, Set, Array, et cetera, and yet... :)

tknopp · 2017-12-18T15:37:02Z

Again the same could be said of Dict, Set, Array, et cetera, and yet... :)

not really. Dicts, Sets, Arrays always will have the same interface independently of their implementation. On the other hand, Sparse matrix formats are not an implementation detail but the storage structure is part of their public interface.

Tim Holy once suggested to merge CRS and CSC into the same type and just use an additional type parameter to encode which type it is. This would IMHO make a lot of sense.

StefanKarpinski · 2017-12-18T15:45:48Z

@tknopp's suggestion is compelling. I do think the analogy to Array and Dict is misleading since there is one standard, most fundamental way to implement an array – as chunk of contiguous memory. Similarly, for Dict sure there are other implementation methods, but the hash table one is the standard. Set is a little closer since there are many relatively reasonable ways to implement set functionality.

Sacha0 · 2017-12-18T15:51:58Z

there is one standard, most fundamental way to implement an array – as chunk of contiguous memory.

Row-major, column-major? Contiguous chunk of memory, or array of pointers to arrays for higher dimensional arrays? :)

ViralBShah · 2017-12-18T16:05:01Z

If we were to leave there sparse matrix implementation in base, it makes sense to rename since it will be the default implementation.

Given that we want to move sparse out of Base, I wonder if it makes sense to leave the name as is without taking over a more generic name, since then the current implementation shouldn't take over something that sounds like a default name.

ViralBShah · 2017-12-18T16:07:06Z

Fwiw, it used to sort of be this way but I changed it a long time back to add csc in anticipation of supporting more formats in base.

ViralBShah · 2017-12-18T20:57:39Z

I think I am onboard to merge this. Even SparseVector does not have a qualifier. Like it or not, this is going to be the most fully featured implementation, and others will follow and have to use different names.

StefanKarpinski · 2017-12-18T21:28:20Z

Since we're move sparse stuff out of Base, it can evolve in the future and if we realize we were wrong, we don't have to wait until Julia 2.0 to fix it :)

tkelman · 2017-12-18T21:37:59Z

SparseVector doesn't have a qualifier because there aren't really multiple storage schemes you might choose.

Sacha0 · 2017-12-18T23:04:36Z

Expanding a bit on my earlier response:

I do think the analogy to Array and Dict is misleading since there is one standard, most fundamental way to implement an array – as chunk of contiguous memory. Similarly, for Dict sure there are other implementation methods, but the hash table one is the standard. Set is a little closer since there are many relatively reasonable ways to implement set functionality.

Array implementations vary considerably across languages: Some are column-major / first-index-fastest while others are row-major / last-index-fastest. Some are contiguous blocks of memory while others are lists of lists or arrays of pointers to arrays. Dictionary implementations also vary considerably across languages: While most are backed by hash tables, hash tables are themselves implemented in a variety of ways with different performance characteristics in practice, and dictionaries are also sometimes implemented via other data structures e.g. trees. And as you say, set implementations come in all shapes and sizes :). This volume of implementation variation meets or exceeds that among the common sparse matrix formats.

Moreover, among the three common sparse matrix formats (CSC, CSR, and COO), CSC is (far and away?) the most common. Consider that the venerable Harwell-Boeing sparse matrix collection's storage format is CSC, its successor the Rutherford-Boeing sparse matrix collection's default storage format is Harwell-Boeing's CSC (with COO a secondary alternative, and certain FEM systems provided in a clique form), and the UF/SuiteSparse sparse matrix collection's default storage format is also Harwell/Rutherford-Boeing's CSC format (again with COO a secondary alternative). These collections are the three major publicly available sparse matrix collections. Additionally, consider that, e.g., general sparse linear solvers code against CSC as a de facto standard input format with higher frequency than both COO and CSR (and CSC over CSR by a wide margin I believe). (Here by general I mean, e.g, not tailored for solving systems that are sets of large cliques or treated as densely banded as in certain specialized FEM solvers.) In fact, CSC may appear similarly consistently across the sparse linear algebra world to column-major, contiguous-memory implementations in the dense array world.

CSC is about as close to a de facto standard implementation as it comes :).

not really. Dicts, Sets, Arrays always will have the same interface independently of their implementation. On the other hand, Sparse matrix formats are not an implementation detail but the storage structure is part of their public interface.

Yes, Dicts satisfy the AbstractDict interface, Sets the AbstractSet interface, and Arrays the AbstractArray. Similarly, being AbstractArrays, sparse arrays also satisfy the AbstractArray interface. Of course when you want to do something custom and performant with an Array, you must know that Arrays are column-major and contiguously stored in memory, and likewise you must know that SparseMatrix is CSC given the same desire. In these senses the storage structure of a sparse matrix is no more a part of its public interface than is the storage structure of an Array part of its public interface.

Consider that most end users do not know what compressed sparse column format is, and they never touch the internals of the sparse types. Instead they use the high level sparse construction methods / the AbstractArray interface / higher order functions to build and superficially manipulate their sparse objects, and high level linear algebraic operations such as matrix/vector adds, muls, and l/r-divs to work with those sparse objects. Relatively advanced end users also exercise high level interfaces to factorizations and solves. Not too many of us regularly venture beyond that point.

True, the constituent fields of sparse matrices/vectors are more visible / accessible than those of Arrays. And for those of us who do venture into the sparse depths, coding against those fields is so far fairly common given we lack a zero-performance-cost, higher-level interface for sparse matrices. But as has been / is being discussed elsewhere, we may well remedy that lack. Additionally, the constituent fields of sparse matrices/vectors are likely to change (also here), and abstractions over the constituent fields and storage structure are already beginning to congeal.

Tim Holy once suggested to merge CRS and CSC into the same type and just use an additional type parameter to encode which type it is. This would IMHO make a lot of sense.

Yes, everyone working with CSC/CSR at one point or another has this lovely idea :). Certainly Andreas and Viral have as well. That those parties most likely to carry out such an idea nonetheless favor this change should be telling. Best! :)

simonbyrne · 2017-12-18T23:32:53Z

Alternatively, we could define

const SparseMatrixCSR = PermutedDimsArray{T,2,(2, 1),(2, 1),SparseMatrixCSC{T,I}} where {T,I}

The downside of course is that SparseMatrixCSR won't be a subtype of AbstractSparseMatrix.

tkelman · 2017-12-18T23:36:15Z

In these senses the storage structure of a sparse matrix is no more a part of its public interface than is the storage structure of an Array part of its public interface.

Given nzrange and how often code that works on sparse matrices actually needs to access the fields to efficiently iterate over the stored elements, this is not true - the storage structure is very much a part of the public interface that many kinds of sparse code need to work on directly. The implementation details of Array and Dict in julia aren't user visible to anywhere near the same extent.

The still-most-recommended API for creating a sparse matrix, the sparse function, presents more of a coordinate-wise interface to users rather than exposing the sorted/compressed aspects of the storage implementation. Not including the storage scheme explicitly as part of the returned type would make this distinction even less intuitive. People are somewhat frequently confused by the exact memory requirements of sparse matrices (e.g. why does a 1x1000 sparse matrix with no nonzeros take more memory than a 1000x1 sparse matrix with no nonzeros), or the widely differing performance characteristics of row slicing vs column slicing. Dense arrays don't suffer from these problems anywhere near as badly.

As a counterpoint, scipy.sparse implements a wide variety of formats without granting any one a favored status of leaving the format out of its name, and presents to users more directly the common workflow of constructing in coordinate form then converting to CSR or CSC for doing linear algebra as actually doing a conversion.

Sacha0 · 2017-12-18T23:44:33Z

Given nzrange and how often code that works on sparse matrices actually needs to access the fields to efficiently iterate over the stored elements, this is not true - the storage structure is very much a part of the public interface that many kinds of sparse code need to work on directly. The implementation details of Array and Dict in julia aren't user visible to anywhere near the same extent.

You might've missed the critical "In these senses..." qualification on the statement to which you responded, and perhaps also the later paragraph beginning as below :).

True, the constituent fields of sparse matrices/vectors are more visible / accessible than those of Arrays. And for those of us who do venture into the sparse depths, coding against those fields is so far fairly common [...]

tkelman · 2017-12-19T00:09:11Z

Not really, we're disagreeing on whether "in these senses" that you laid out are practically relevant to the code people write. Any nzrange loop or field access written against SparseMatrixCSC would be completely incorrect for a different storage format - most pure-Julia code written against the Array or Dict API's might be slower if the storage were discontiguous or used a different storage order, but it wouldn't be visible in the behavior of the type to anywhere near an extent (other-language interop being the notable exception).

The possibility of adding more generalizable API's in the indeterminate future doesn't seem to have much bearing on whether a type should have important clarifying information about its behavior removed from its name right now. If anything new API's would make adding alternate formats simpler, which would make this name change a worse idea.

Sacha0 · 2017-12-19T00:28:40Z

The possibility of adding more formats in the indeterminate future doesn't seem to have much bearing on whether a type should have its original, simpler, more elegant name as the canonical instance of its class restored.

ftfy 😉

StefanKarpinski · 2017-12-21T19:16:25Z

Once Sparse is in the stdlib, we can rename this whenever, so this is not a Base 1.0 issue.

Sacha0 · 2017-12-21T19:22:20Z

Additionally, triage was happy with introducing the SparseMatrix alias to SparseMatrixCSC.

Sacha0 · 2018-01-06T23:35:34Z

Perhaps rebase, drop the SparseMatrixCSC type rename and add the SparseMatrix type alias as per triage, ping me for a review, and merge? :)

tkelman · 2018-01-07T00:56:53Z

simpler, more elegant

That's clearly debatable. I don't know of any sparse matrix libraries that implement, user-visibly (so this does not include matlab, where things being csc is not visible in the api), only one of CSC and CSR but not the other and give only one of them a preferred canonical status by leaving format details out of the name. It would be disappointing for Julia to set a bad precedent by doing that merely to save 3 characters. It also strikes me as ill-advised to reserve a new name right now with the same existing behavior, which would make some implementations that use a more general parameterized type breaking.

general sparse linear solvers code against CSC as a de facto standard input format with higher frequency than both COO and CSR (and CSC over CSR by a wide margin I believe)

Every sparse solver I'm aware of, and I've used dozens, implements both.

StefanKarpinski · 2018-01-07T02:18:45Z

I have to say, I'm not super amped about this rename. We can always introduce an alias later if we want to.

simonbyrne · 2018-01-07T03:00:19Z

I would prefer SparseCSCMatrix: all the other matrix types are sufficed with Matrix.

ViralBShah · 2018-01-07T07:46:20Z

This is a tough one (and I have changed my mind a few times). On the one hand, we do not name the data structure in any other Julia Type. We don't say DenseColMatrix, for example. Also, even though people have spoken about different sparse matrix storage types and generic implementations for a really long time now, are there reasonably complete implementations as packages (although people do have their own custom sparse representations for their own packages)?

If it were to live in Base, I think naming it SparseMatrix made sense (to go along with Dict, Matrix, etc.). But if it is going to Stdlib, and perhaps later into its own package, we probably shouldn't cement it as the default implementation.

I do like the SparseCSCMatrix suggestion.

ViralBShah · 2018-01-07T07:47:51Z

Maybe we should do this in two steps? Excise first and rename afterwards?

StefanKarpinski · 2018-01-07T15:21:12Z

Yes, please, since the excision is both more important and less controversial. This probably would have been merged weeks ago if it had only included the excision.

Sacha0 · 2018-01-07T19:54:28Z

(Re. excising SparseArrays, @fredrikekre's fantastic work on that front is more or less ready to go, pending merge of #25364.)

ViralBShah · 2018-12-17T07:37:32Z

What are folks feeling like about this now? I feel that the right thing at this point is to create a clean new sparse implementation in a package (borrowing bits and pieces as necessary from the current one) and replace the one in stdlib with an external package.

ViralBShah · 2018-12-17T08:12:03Z

Closing for now (but can always reopen).

fredrikekre added the sparse Sparse arrays label Dec 17, 2017

fredrikekre requested a review from Sacha0 December 17, 2017 23:43

fredrikekre force-pushed the fe/sparsematrix branch from 4badec4 to 7d7f599 Compare December 17, 2017 23:44

Sacha0 reviewed Dec 18, 2017

View reviewed changes

rename SparseMatrixCSC -> SparseMatrix

a676cc6

fredrikekre force-pushed the fe/sparsematrix branch from 7d7f599 to a676cc6 Compare December 18, 2017 12:20

Sacha0 added the triage This should be discussed on a triage call label Dec 21, 2017

StefanKarpinski added the stdlib Julia's standard library label Dec 21, 2017

StefanKarpinski removed the triage This should be discussed on a triage call label Dec 21, 2017

tkf mentioned this pull request Dec 1, 2018

Request: Allow arbitrary array type for nzval of sparse matrix #30173

Closed

ViralBShah closed this Dec 17, 2018

rename SparseMatrixCSC to SparseMatrix #25151

rename SparseMatrixCSC to SparseMatrix #25151

Conversation

fredrikekre commented Dec 17, 2017

oscardssmith commented Dec 18, 2017

Sacha0 left a comment

Choose a reason for hiding this comment

Sacha0 commented Dec 18, 2017

StefanKarpinski commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

StefanKarpinski commented Dec 18, 2017

Sacha0 commented Dec 18, 2017 • edited Loading

StefanKarpinski commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

tkelman commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

tknopp commented Dec 18, 2017

StefanKarpinski commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

ViralBShah commented Dec 18, 2017

ViralBShah commented Dec 18, 2017

ViralBShah commented Dec 18, 2017

StefanKarpinski commented Dec 18, 2017

tkelman commented Dec 18, 2017

Sacha0 commented Dec 18, 2017 • edited Loading

simonbyrne commented Dec 18, 2017

tkelman commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

tkelman commented Dec 19, 2017

Sacha0 commented Dec 19, 2017

StefanKarpinski commented Dec 21, 2017

Sacha0 commented Dec 21, 2017

Sacha0 commented Jan 6, 2018

tkelman commented Jan 7, 2018 • edited Loading

StefanKarpinski commented Jan 7, 2018

simonbyrne commented Jan 7, 2018 via email

ViralBShah commented Jan 7, 2018 • edited Loading

ViralBShah commented Jan 7, 2018

StefanKarpinski commented Jan 7, 2018

Sacha0 commented Jan 7, 2018

ViralBShah commented Dec 17, 2018

ViralBShah commented Dec 17, 2018

Sacha0 commented Dec 18, 2017 •

edited

Loading

Sacha0 commented Dec 18, 2017 •

edited

Loading

tkelman commented Jan 7, 2018 •

edited

Loading

ViralBShah commented Jan 7, 2018 •

edited

Loading