Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename fields of SparseMatrixCSC #52

Open
simonbyrne opened this issue Dec 15, 2017 · 24 comments
Open

Rename fields of SparseMatrixCSC #52

simonbyrne opened this issue Dec 15, 2017 · 24 comments

Comments

@simonbyrne
Copy link
Contributor

Currently the 5 fields are named:

  • m
  • n
  • colptr
  • rowval
  • nzval

My complaints are:

  1. "row" and "col" are used in two of the fields, but not their respective sizes (m and n)
  2. rowval is confusing (what are "row values"?)
  3. colptr: isn't a pointer a C thing? we don't use it anywhere else in the language
  4. nzval: don't we now allow zeros in the non-sparse elements?

I would propose

  • nrow
  • ncol
  • coloffsets
  • rowindices
  • values
@Sacha0
Copy link
Member

Sacha0 commented Dec 15, 2017

I wholeheartedly sympathize. Could we postpone this discussion till after the release though? I have been meaning to write up some thoughts on the matter, but cannot spare the time for the foreseeable future :). Best!

@simonbyrne
Copy link
Contributor Author

Yes, once we have getfield overloading, any changes here should be non-breaking.

@Sacha0
Copy link
Member

Sacha0 commented Dec 15, 2017

Yes, once we have getfield overloading, any changes here should be non-breaking.

That upside of getfield overloading had not occurred to me. Fantastic! 😄

@StefanKarpinski
Copy link
Contributor

That upside of getfield overloading had not occurred to me. Fantastic! 😄

Oh man, that's one of the main benefits – it decouples "virtual fields" from actual fields.

@dpsanders
Copy link

nrow-> rows?

@oscardssmith
Copy link

why not row_num and col_num?

@StefanKarpinski
Copy link
Contributor

I would call it nrows; this is a fairly common naming pattern for us already, ndims, etc.

@Sacha0
Copy link
Member

Sacha0 commented Dec 16, 2017

Linking JuliaLang/julia#24910 (comment). Best!

@tknopp
Copy link

tknopp commented Dec 17, 2017

Are these fields meant to be public or private? Before this very essential question is answered its difficult to discuss.

m and n are definitely private fields. One uses size to get the sizes. IMHO also all other fields are private. There is a publicrowvals function, i.e. it is clear that .rowvals is private. But it seems that a corresponding method for colptr is missing.

That upside of getfield overloading had not occurred to me. Fantastic!

In principle yes, but I am pretty sure that you don't want to use getfield overloading to maintain compatibility of internal interfaces.

@KristofferC
Copy link
Member

Currently, you need to use the fields to do almost any efficient sparse matrix manipulations. So they are de facto public.

@tknopp
Copy link

tknopp commented Dec 17, 2017

Could you make this please more precise? Which fields are public? -> We really need a clear communication (in the documentation) here.

@andreasnoack
Copy link
Member

I'm fine with changing the names but it would better if we could generally use getters as in https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/sparsematrix.jl#L48-L53 since it would allow many algorithms to work efficiently for views of full columns of sparse matrices.

@Sacha0
Copy link
Member

Sacha0 commented Dec 17, 2017

The sparse higher order functions code takes a similar approach https://github.com/JuliaLang/julia/blob/292a3b4c73b91bcb3dbe2d9d7c56f8d747ce5207/base/sparse/higherorderfns.jl#L31 to provide a uniform interface to sparse vectors and matrices, and perhaps other (e.g. structured) objects as well going forward. My thinking on this front has been motivated by cleaning that interface up. Best!

@simonbyrne
Copy link
Contributor Author

That would be fine, but those functions are neither exported nor documented.

@StefanKarpinski
Copy link
Contributor

I think that for now we should consider these fields to be private even though library code has been effectively forced to use them in order to write faster sparse code. Change the names to better names and also provide an official API for accessing what one needs to write fast sparse code. This may cause some breakage; we can either use the new getproperty feature to make the old field names work (possibly with deprecations) or we can just proactively fix broken uses of these fields.

@tkf
Copy link
Contributor

tkf commented Aug 24, 2019

Hi, I'm proposing a new AbstractSparseMatrixCSC interface in PR JuliaLang/julia#33054. We need to document the accessor functions so people here may be interested in the discussion there.

@ViralBShah
Copy link
Member

I am going to close this issue and request discussion in JuliaLang/julia#33054. This is an opportunity to use better names for the accessors.

@tkf
Copy link
Contributor

tkf commented Aug 25, 2019

Let me summarize the recent changes I made related to this issue:

In JuliaLang/julia#30173, @ViralBShah and I were discussing the way to expose SparseMatrixCSC functionality in such a way that it is easily extensible by the users. We came to the agreement that defining SparseMatrixCSC functions in terms of accessor methods and abstract type is the way to go.

In JuliaLang/julia#32953 and JuliaLang/julia#33039 I did the refactoring so that most of the code touching SparseMatrixCSC is now defined in terms of AbstractSparseMatrixCSC and its accessor functions. At this point, nothing is breaking (or at least that's my intention) and pre-existing accessor functions (e.g. getcolptr and rowvals) are used as-is.

Now in JuliaLang/julia#33054 we are discussing how to document the interface of AbstractSparseMatrixCSC (e.g., accessor functions and assumed invariance of the underlying data). As getcolptr is not exported and documented, it is a good chance to use a better name. Accessor function coloffsets(::AbstractSparseMatrixCSC) sounds like a very good option.

A more controversial part of the discussion is that if we should add new accessor functions like rowindices and declare the old existing accessor functions like rowvals to be aliases of them (JuliaLang/julia#33054 (comment)). It is totally backward compatible change. The benefit of doing so now is that the code written after Julia 1.4 can be forward compatible with Julia 2.0 if we do this now (especially the subtypes of AbstractSparseMatrixCSC). Of course, we need to consider the balance of the pros and cons. I can imagine that there will be some confusions due to the coexistence of old and new APIs. IIUC, that's the main concern @ViralBShah has.

@tkf
Copy link
Contributor

tkf commented Aug 25, 2019

Questions I'd like to ask here are:

  1. What accessor function should we use for colptr? Should it be coloffsets? Should we just document and export getcolptr?

  2. Should we introduce new accessors for rowval and nzval? They are called rowvals and nonzeros now but they can be better. Alternative new APIs are rowindices proposed by @simonbyrne in the OP and storedvals proposed by @ViralBShah RFC: AbstractSparseMatrixCSC interface JuliaLang/julia#33054 (comment).

Note that m and n are accessible with size(A) and size(A, i). I suppose there will be no argument against it.

@Sacha0
Copy link
Member

Sacha0 commented Aug 26, 2019

@tkf
Copy link
Contributor

tkf commented Aug 27, 2019

Thanks. So @Sacha0, @ViralBShah, and I all seem to think that it makes sense to use storedindices or storeinds for SparseVector if nonzeros is renamed to storedvals.

@Sacha0's comment also mentions SparseMatrixCSR. I don't know if we want to take immediate action but we can also rename nzrange to something to be consistent with the notion of "stored" vs "zeros" and also maybe make sure to use a name that indicates that the first argument is CSC. ref: #50

@ViralBShah
Copy link
Member

That has been the direction we have been moving in.

@ViralBShah
Copy link
Member

So should we adopt the new names, and support the old names with properties (until 2.0 rolls around)? The accessors can also be appropriately named then - and the deprecations can be carried out whenever 2.0 comes around.

@tkf
Copy link
Contributor

tkf commented Nov 22, 2019

The only absolute blocker for AbstractSparseMatrixCSC JuliaLang/julia#33054 is the name of accessor function for field .colptr. Other accessor functions already exist as public API so we have to keep (at least) aliases anyway.

Can this be discussed in triage? I think the desired outcome is one of the following:

  • Triage decides the name of the public accessor function for field .colptr. Candidates are:

    • coloffsets
    • getcolptr (existing but undocumented internal function)
  • Triage delegates the decision to members whoever active in this issue.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants