Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Cov for sparse matrices #447

Closed
jeffwong opened this issue Jul 10, 2017 · 2 comments
Closed

Proposal: Cov for sparse matrices #447

jeffwong opened this issue Jul 10, 2017 · 2 comments
Labels
sparse Sparse arrays

Comments

@jeffwong
Copy link

Hi, I am a former R developer and have recently switched to Julia. I wanted to get help on this issue and propose a solution:

I would like to propose an optimized function for cov on sparse matrices. It is motivated by this example:

using Distributions

n = 50000
p = 500

x_dense = rand(Uniform(-1, 2), n, p)
@time cov1 = cov(x_dense)

x_sparse = sprand(n, p, .05)
@time cov1 = cov(x_sparse)

On the sparse matrix it takes ~ 27 seconds while on the dense matrix it takes only ~5 seconds. Coming from R, I am guessing that the sparse matrix is going through some kind of densification step, which is generating a ton of overhead. Since column means are optimized for sparse matrices, it is possible to do cov in a smart way. The idea will be to expand terms in the definition of cov(X)

cov(X) = E[(X - E[X]) (X - E[X])]

= X'X - X'\mu - \mu'X + \mu'\mu

where X is still sparse, and \mu is the vector of column means. Each component of this expansion can be built around functions optimized for sparse matrices/vectors.

My implementation takes about 0.5 seconds for the sparse case. I would like to contribute it to Julia, but I am still new to the language and don't know where the function should go, nor do I know where tests/docs go.

The function right now looks like

function myspcov(X::SparseMatrixCSC) ...

It was recommended that this function go into base, instead of JuliaStats. I would love to submit a PR and become a contributor if someone can give me pointers

Here are some references:
JuliaStats/StatsBase.jl#282
JuliaLang/julia#7788

@ararslan
Copy link
Member

Hey @jeffwong, welcome! I'm glad to hear we've been able to steal you from R. 😉 It'd be great to have you contribute this functionality to Julia. Here are my recommendations:

  • Define your function in base/sparse/linalg.jl
  • Add tests for it in test/sparse/sparse.jl

Since it's an optimization rather than new functionality (since one can already compute the covariance with a sparse matrix, it's just slow), I don't think you'd need to add any extra documentation. Never hurts to document your code with comments though. 🙂

When in doubt, I recommend consulting CONTRIBUTING.md, then simply opening a PR. We can always answer specific questions and make suggestions in the PR.

@fredrikekre
Copy link
Member

JuliaLang/julia#22735

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

3 participants