Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename SparseMatrixCSC to SparseMatrix #25151

Closed
wants to merge 1 commit into from

Conversation

fredrikekre
Copy link
Member

Makes the constructor much friendlier to type (SparseMatrix(I, 3, 3) etc...)

CC @ViralBShah and @Sacha0 who previously discussed this.

@fredrikekre fredrikekre added the sparse Sparse arrays label Dec 17, 2017
@fredrikekre fredrikekre requested a review from Sacha0 December 17, 2017 23:43
@oscardssmith
Copy link
Member

Given how many different types of sparse matrices there are, and how different their performance characteristics are, I think it probably makes more sense to specify and allow people to alias if they want to.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Thanks @fredrikekre! Modulo CI approval, lgtm :).

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Given how many different types of sparse matrices there are, and how different their performance characteristics are, I think it probably makes more sense to specify and allow people to alias if they want to.

The same could be said of Dict, Set, Array, and so on; should Dict, Set, Array, etc have qualifications added to their names? SparseArrays implements one and only one sparse matrix format, and that de facto standard format having a nice name does not preclude implementation and use of other formats in SparseArrays or elsewhere. Best!

@StefanKarpinski
Copy link
Member

Hmm... I'm not sure how I feel about this. Both sides make sense to me.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Re. @fredrikekre's opening comment, for what it's worth both Viral and I were quite fond of this rename as heavy users of SparseArrays. Best!

@StefanKarpinski
Copy link
Member

I can see that. How about leaving SparseMatrixCSC as an alias for SparseMatrix so that people using SparseMatrixCSC in addition to e.g. SparseMatrixCSR can use the full names. Otherwise the lack of symmetry is a little disconcerting.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Sounds good on this end :). (Though the partial Julia CSR implementation I am aware of has been unmaintained for a few years.)

@StefanKarpinski
Copy link
Member

Right, but obviously we'd like to have a complete CSR implementation at some point :)

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Right, but obviously we'd like to have a complete CSR implementation at some point :)

You touch on my secret plans... 😎

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

Having other formats implemented here or elsewhere would however make code start looking very confusing if only CSC gets the privilege of not being mentioned by name. CSC does not actually work the way many people naively assume sparse matrices to behave, many APIs give people the impression COO is how the storage should work. Dropping the disambiguation doesn't seem worth saving 3 characters.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Having other formats implemented here or elsewhere would however make code start looking very confusing if only CSC gets the privilege of not being mentioned by name. CSC does not actually work the way many people naively assume sparse matrices to behave, many APIs give people the impression COO is how the storage should work.

Again the same could be said of Dict, Set, Array, et cetera, and yet... :)

@tknopp
Copy link
Contributor

tknopp commented Dec 18, 2017

Again the same could be said of Dict, Set, Array, et cetera, and yet... :)

not really. Dicts, Sets, Arrays always will have the same interface independently of their implementation. On the other hand, Sparse matrix formats are not an implementation detail but the storage structure is part of their public interface.

Tim Holy once suggested to merge CRS and CSC into the same type and just use an additional type parameter to encode which type it is. This would IMHO make a lot of sense.

@StefanKarpinski
Copy link
Member

@tknopp's suggestion is compelling. I do think the analogy to Array and Dict is misleading since there is one standard, most fundamental way to implement an array – as chunk of contiguous memory. Similarly, for Dict sure there are other implementation methods, but the hash table one is the standard. Set is a little closer since there are many relatively reasonable ways to implement set functionality.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

there is one standard, most fundamental way to implement an array – as chunk of contiguous memory.

Row-major, column-major? Contiguous chunk of memory, or array of pointers to arrays for higher dimensional arrays? :)

@ViralBShah
Copy link
Member

If we were to leave there sparse matrix implementation in base, it makes sense to rename since it will be the default implementation.

Given that we want to move sparse out of Base, I wonder if it makes sense to leave the name as is without taking over a more generic name, since then the current implementation shouldn't take over something that sounds like a default name.

@ViralBShah
Copy link
Member

Fwiw, it used to sort of be this way but I changed it a long time back to add csc in anticipation of supporting more formats in base.

@ViralBShah
Copy link
Member

I think I am onboard to merge this. Even SparseVector does not have a qualifier. Like it or not, this is going to be the most fully featured implementation, and others will follow and have to use different names.

@StefanKarpinski
Copy link
Member

Since we're move sparse stuff out of Base, it can evolve in the future and if we realize we were wrong, we don't have to wait until Julia 2.0 to fix it :)

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

SparseVector doesn't have a qualifier because there aren't really multiple storage schemes you might choose.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Expanding a bit on my earlier response:

I do think the analogy to Array and Dict is misleading since there is one standard, most fundamental way to implement an array – as chunk of contiguous memory. Similarly, for Dict sure there are other implementation methods, but the hash table one is the standard. Set is a little closer since there are many relatively reasonable ways to implement set functionality.

Array implementations vary considerably across languages: Some are column-major / first-index-fastest while others are row-major / last-index-fastest. Some are contiguous blocks of memory while others are lists of lists or arrays of pointers to arrays. Dictionary implementations also vary considerably across languages: While most are backed by hash tables, hash tables are themselves implemented in a variety of ways with different performance characteristics in practice, and dictionaries are also sometimes implemented via other data structures e.g. trees. And as you say, set implementations come in all shapes and sizes :). This volume of implementation variation meets or exceeds that among the common sparse matrix formats.

Moreover, among the three common sparse matrix formats (CSC, CSR, and COO), CSC is (far and away?) the most common. Consider that the venerable Harwell-Boeing sparse matrix collection's storage format is CSC, its successor the Rutherford-Boeing sparse matrix collection's default storage format is Harwell-Boeing's CSC (with COO a secondary alternative, and certain FEM systems provided in a clique form), and the UF/SuiteSparse sparse matrix collection's default storage format is also Harwell/Rutherford-Boeing's CSC format (again with COO a secondary alternative). These collections are the three major publicly available sparse matrix collections. Additionally, consider that, e.g., general sparse linear solvers code against CSC as a de facto standard input format with higher frequency than both COO and CSR (and CSC over CSR by a wide margin I believe). (Here by general I mean, e.g, not tailored for solving systems that are sets of large cliques or treated as densely banded as in certain specialized FEM solvers.) In fact, CSC may appear similarly consistently across the sparse linear algebra world to column-major, contiguous-memory implementations in the dense array world.

CSC is about as close to a de facto standard implementation as it comes :).

not really. Dicts, Sets, Arrays always will have the same interface independently of their implementation. On the other hand, Sparse matrix formats are not an implementation detail but the storage structure is part of their public interface.

Yes, Dicts satisfy the AbstractDict interface, Sets the AbstractSet interface, and Arrays the AbstractArray. Similarly, being AbstractArrays, sparse arrays also satisfy the AbstractArray interface. Of course when you want to do something custom and performant with an Array, you must know that Arrays are column-major and contiguously stored in memory, and likewise you must know that SparseMatrix is CSC given the same desire. In these senses the storage structure of a sparse matrix is no more a part of its public interface than is the storage structure of an Array part of its public interface.

Consider that most end users do not know what compressed sparse column format is, and they never touch the internals of the sparse types. Instead they use the high level sparse construction methods / the AbstractArray interface / higher order functions to build and superficially manipulate their sparse objects, and high level linear algebraic operations such as matrix/vector adds, muls, and l/r-divs to work with those sparse objects. Relatively advanced end users also exercise high level interfaces to factorizations and solves. Not too many of us regularly venture beyond that point.

True, the constituent fields of sparse matrices/vectors are more visible / accessible than those of Arrays. And for those of us who do venture into the sparse depths, coding against those fields is so far fairly common given we lack a zero-performance-cost, higher-level interface for sparse matrices. But as has been / is being discussed elsewhere, we may well remedy that lack. Additionally, the constituent fields of sparse matrices/vectors are likely to change (also here), and abstractions over the constituent fields and storage structure are already beginning to congeal.

Tim Holy once suggested to merge CRS and CSC into the same type and just use an additional type parameter to encode which type it is. This would IMHO make a lot of sense.

Yes, everyone working with CSC/CSR at one point or another has this lovely idea :). Certainly Andreas and Viral have as well. That those parties most likely to carry out such an idea nonetheless favor this change should be telling. Best! :)

@simonbyrne
Copy link
Contributor

Alternatively, we could define

const SparseMatrixCSR = PermutedDimsArray{T,2,(2, 1),(2, 1),SparseMatrixCSC{T,I}} where {T,I}

The downside of course is that SparseMatrixCSR won't be a subtype of AbstractSparseMatrix.

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

In these senses the storage structure of a sparse matrix is no more a part of its public interface than is the storage structure of an Array part of its public interface.

Given nzrange and how often code that works on sparse matrices actually needs to access the fields to efficiently iterate over the stored elements, this is not true - the storage structure is very much a part of the public interface that many kinds of sparse code need to work on directly. The implementation details of Array and Dict in julia aren't user visible to anywhere near the same extent.

The still-most-recommended API for creating a sparse matrix, the sparse function, presents more of a coordinate-wise interface to users rather than exposing the sorted/compressed aspects of the storage implementation. Not including the storage scheme explicitly as part of the returned type would make this distinction even less intuitive. People are somewhat frequently confused by the exact memory requirements of sparse matrices (e.g. why does a 1x1000 sparse matrix with no nonzeros take more memory than a 1000x1 sparse matrix with no nonzeros), or the widely differing performance characteristics of row slicing vs column slicing. Dense arrays don't suffer from these problems anywhere near as badly.

As a counterpoint, scipy.sparse implements a wide variety of formats without granting any one a favored status of leaving the format out of its name, and presents to users more directly the common workflow of constructing in coordinate form then converting to CSR or CSC for doing linear algebra as actually doing a conversion.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2017

Given nzrange and how often code that works on sparse matrices actually needs to access the fields to efficiently iterate over the stored elements, this is not true - the storage structure is very much a part of the public interface that many kinds of sparse code need to work on directly. The implementation details of Array and Dict in julia aren't user visible to anywhere near the same extent.

You might've missed the critical "In these senses..." qualification on the statement to which you responded, and perhaps also the later paragraph beginning as below :).

True, the constituent fields of sparse matrices/vectors are more visible / accessible than those of Arrays. And for those of us who do venture into the sparse depths, coding against those fields is so far fairly common [...]

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2017

Not really, we're disagreeing on whether "in these senses" that you laid out are practically relevant to the code people write. Any nzrange loop or field access written against SparseMatrixCSC would be completely incorrect for a different storage format - most pure-Julia code written against the Array or Dict API's might be slower if the storage were discontiguous or used a different storage order, but it wouldn't be visible in the behavior of the type to anywhere near an extent (other-language interop being the notable exception).

The possibility of adding more generalizable API's in the indeterminate future doesn't seem to have much bearing on whether a type should have important clarifying information about its behavior removed from its name right now. If anything new API's would make adding alternate formats simpler, which would make this name change a worse idea.

@Sacha0
Copy link
Member

Sacha0 commented Dec 19, 2017

The possibility of adding more formats in the indeterminate future doesn't seem to have much bearing on whether a type should have its original, simpler, more elegant name as the canonical instance of its class restored.

ftfy 😉

@Sacha0 Sacha0 added the triage This should be discussed on a triage call label Dec 21, 2017
@StefanKarpinski StefanKarpinski added the stdlib Julia's standard library label Dec 21, 2017
@StefanKarpinski
Copy link
Member

Once Sparse is in the stdlib, we can rename this whenever, so this is not a Base 1.0 issue.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Dec 21, 2017
@Sacha0
Copy link
Member

Sacha0 commented Dec 21, 2017

Additionally, triage was happy with introducing the SparseMatrix alias to SparseMatrixCSC.

@Sacha0
Copy link
Member

Sacha0 commented Jan 6, 2018

Perhaps rebase, drop the SparseMatrixCSC type rename and add the SparseMatrix type alias as per triage, ping me for a review, and merge? :)

@tkelman
Copy link
Contributor

tkelman commented Jan 7, 2018

simpler, more elegant

That's clearly debatable. I don't know of any sparse matrix libraries that implement, user-visibly (so this does not include matlab, where things being csc is not visible in the api), only one of CSC and CSR but not the other and give only one of them a preferred canonical status by leaving format details out of the name. It would be disappointing for Julia to set a bad precedent by doing that merely to save 3 characters. It also strikes me as ill-advised to reserve a new name right now with the same existing behavior, which would make some implementations that use a more general parameterized type breaking.

general sparse linear solvers code against CSC as a de facto standard input format with higher frequency than both COO and CSR (and CSC over CSR by a wide margin I believe)

Every sparse solver I'm aware of, and I've used dozens, implements both.

@StefanKarpinski
Copy link
Member

I have to say, I'm not super amped about this rename. We can always introduce an alias later if we want to.

@simonbyrne
Copy link
Contributor

simonbyrne commented Jan 7, 2018 via email

@ViralBShah
Copy link
Member

ViralBShah commented Jan 7, 2018

This is a tough one (and I have changed my mind a few times). On the one hand, we do not name the data structure in any other Julia Type. We don't say DenseColMatrix, for example. Also, even though people have spoken about different sparse matrix storage types and generic implementations for a really long time now, are there reasonably complete implementations as packages (although people do have their own custom sparse representations for their own packages)?

If it were to live in Base, I think naming it SparseMatrix made sense (to go along with Dict, Matrix, etc.). But if it is going to Stdlib, and perhaps later into its own package, we probably shouldn't cement it as the default implementation.

I do like the SparseCSCMatrix suggestion.

@ViralBShah
Copy link
Member

Maybe we should do this in two steps? Excise first and rename afterwards?

@StefanKarpinski
Copy link
Member

Yes, please, since the excision is both more important and less controversial. This probably would have been merged weeks ago if it had only included the excision.

@Sacha0
Copy link
Member

Sacha0 commented Jan 7, 2018

(Re. excising SparseArrays, @fredrikekre's fantastic work on that front is more or less ready to go, pending merge of #25364.)

@ViralBShah
Copy link
Member

What are folks feeling like about this now? I feel that the right thing at this point is to create a clean new sparse implementation in a package (borrowing bits and pieces as necessary from the current one) and replace the one in stdlib with an external package.

@ViralBShah
Copy link
Member

Closing for now (but can always reopen).

@ViralBShah ViralBShah closed this Dec 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays stdlib Julia's standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants