-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename SparseMatrixCSC to SparseMatrix #25151
Conversation
4badec4
to
7d7f599
Compare
Given how many different types of sparse matrices there are, and how different their performance characteristics are, I think it probably makes more sense to specify and allow people to alias if they want to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Thanks @fredrikekre! Modulo CI approval, lgtm :).
The same could be said of |
Hmm... I'm not sure how I feel about this. Both sides make sense to me. |
Re. |
I can see that. How about leaving |
Sounds good on this end :). (Though the partial Julia CSR implementation I am aware of has been unmaintained for a few years.) |
Right, but obviously we'd like to have a complete CSR implementation at some point :) |
You touch on my secret plans... 😎 |
7d7f599
to
a676cc6
Compare
Having other formats implemented here or elsewhere would however make code start looking very confusing if only CSC gets the privilege of not being mentioned by name. CSC does not actually work the way many people naively assume sparse matrices to behave, many APIs give people the impression COO is how the storage should work. Dropping the disambiguation doesn't seem worth saving 3 characters. |
Again the same could be said of |
not really. Dicts, Sets, Arrays always will have the same interface independently of their implementation. On the other hand, Sparse matrix formats are not an implementation detail but the storage structure is part of their public interface. Tim Holy once suggested to merge CRS and CSC into the same type and just use an additional type parameter to encode which type it is. This would IMHO make a lot of sense. |
@tknopp's suggestion is compelling. I do think the analogy to |
Row-major, column-major? Contiguous chunk of memory, or array of pointers to arrays for higher dimensional arrays? :) |
If we were to leave there sparse matrix implementation in base, it makes sense to rename since it will be the default implementation. Given that we want to move sparse out of Base, I wonder if it makes sense to leave the name as is without taking over a more generic name, since then the current implementation shouldn't take over something that sounds like a default name. |
Fwiw, it used to sort of be this way but I changed it a long time back to add csc in anticipation of supporting more formats in base. |
I think I am onboard to merge this. Even SparseVector does not have a qualifier. Like it or not, this is going to be the most fully featured implementation, and others will follow and have to use different names. |
Since we're move sparse stuff out of Base, it can evolve in the future and if we realize we were wrong, we don't have to wait until Julia 2.0 to fix it :) |
SparseVector doesn't have a qualifier because there aren't really multiple storage schemes you might choose. |
Expanding a bit on my earlier response:
Array implementations vary considerably across languages: Some are column-major / first-index-fastest while others are row-major / last-index-fastest. Some are contiguous blocks of memory while others are lists of lists or arrays of pointers to arrays. Dictionary implementations also vary considerably across languages: While most are backed by hash tables, hash tables are themselves implemented in a variety of ways with different performance characteristics in practice, and dictionaries are also sometimes implemented via other data structures e.g. trees. And as you say, set implementations come in all shapes and sizes :). This volume of implementation variation meets or exceeds that among the common sparse matrix formats. Moreover, among the three common sparse matrix formats (CSC, CSR, and COO), CSC is (far and away?) the most common. Consider that the venerable Harwell-Boeing sparse matrix collection's storage format is CSC, its successor the Rutherford-Boeing sparse matrix collection's default storage format is Harwell-Boeing's CSC (with COO a secondary alternative, and certain FEM systems provided in a clique form), and the UF/SuiteSparse sparse matrix collection's default storage format is also Harwell/Rutherford-Boeing's CSC format (again with COO a secondary alternative). These collections are the three major publicly available sparse matrix collections. Additionally, consider that, e.g., general sparse linear solvers code against CSC as a de facto standard input format with higher frequency than both COO and CSR (and CSC over CSR by a wide margin I believe). (Here by general I mean, e.g, not tailored for solving systems that are sets of large cliques or treated as densely banded as in certain specialized FEM solvers.) In fact, CSC may appear similarly consistently across the sparse linear algebra world to column-major, contiguous-memory implementations in the dense array world. CSC is about as close to a de facto standard implementation as it comes :).
Yes, Consider that most end users do not know what compressed sparse column format is, and they never touch the internals of the sparse types. Instead they use the high level sparse construction methods / the True, the constituent fields of sparse matrices/vectors are more visible / accessible than those of
Yes, everyone working with CSC/CSR at one point or another has this lovely idea :). Certainly Andreas and Viral have as well. That those parties most likely to carry out such an idea nonetheless favor this change should be telling. Best! :) |
Alternatively, we could define
The downside of course is that |
Given The still-most-recommended API for creating a sparse matrix, the As a counterpoint, scipy.sparse implements a wide variety of formats without granting any one a favored status of leaving the format out of its name, and presents to users more directly the common workflow of constructing in coordinate form then converting to CSR or CSC for doing linear algebra as actually doing a conversion. |
You might've missed the critical "In these senses..." qualification on the statement to which you responded, and perhaps also the later paragraph beginning as below :).
|
Not really, we're disagreeing on whether "in these senses" that you laid out are practically relevant to the code people write. Any The possibility of adding more generalizable API's in the indeterminate future doesn't seem to have much bearing on whether a type should have important clarifying information about its behavior removed from its name right now. If anything new API's would make adding alternate formats simpler, which would make this name change a worse idea. |
ftfy 😉 |
Once Sparse is in the stdlib, we can rename this whenever, so this is not a Base 1.0 issue. |
Additionally, triage was happy with introducing the |
Perhaps rebase, drop the |
That's clearly debatable. I don't know of any sparse matrix libraries that implement, user-visibly (so this does not include matlab, where things being csc is not visible in the api), only one of CSC and CSR but not the other and give only one of them a preferred canonical status by leaving format details out of the name. It would be disappointing for Julia to set a bad precedent by doing that merely to save 3 characters. It also strikes me as ill-advised to reserve a new name right now with the same existing behavior, which would make some implementations that use a more general parameterized type breaking.
Every sparse solver I'm aware of, and I've used dozens, implements both. |
I have to say, I'm not super amped about this rename. We can always introduce an alias later if we want to. |
I would prefer SparseCSCMatrix: all the other matrix types are sufficed
with Matrix.
|
This is a tough one (and I have changed my mind a few times). On the one hand, we do not name the data structure in any other Julia Type. We don't say DenseColMatrix, for example. Also, even though people have spoken about different sparse matrix storage types and generic implementations for a really long time now, are there reasonably complete implementations as packages (although people do have their own custom sparse representations for their own packages)? If it were to live in Base, I think naming it SparseMatrix made sense (to go along with Dict, Matrix, etc.). But if it is going to Stdlib, and perhaps later into its own package, we probably shouldn't cement it as the default implementation. I do like the SparseCSCMatrix suggestion. |
Maybe we should do this in two steps? Excise first and rename afterwards? |
Yes, please, since the excision is both more important and less controversial. This probably would have been merged weeks ago if it had only included the excision. |
(Re. excising |
What are folks feeling like about this now? I feel that the right thing at this point is to create a clean new sparse implementation in a package (borrowing bits and pieces as necessary from the current one) and replace the one in |
Closing for now (but can always reopen). |
Makes the constructor much friendlier to type (
SparseMatrix(I, 3, 3)
etc...)CC @ViralBShah and @Sacha0 who previously discussed this.