You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am a former R developer and have recently switched to Julia. I wanted to get help on this issue and propose a solution:
I would like to propose an optimized function for cov on sparse matrices. It is motivated by this example:
using Distributions
n = 50000
p = 500
x_dense = rand(Uniform(-1, 2), n, p)
@time cov1 = cov(x_dense)
x_sparse = sprand(n, p, .05)
@time cov1 = cov(x_sparse)
On the sparse matrix it takes ~ 27 seconds while on the dense matrix it takes only ~5 seconds. Coming from R, I am guessing that the sparse matrix is going through some kind of densification step, which is generating a ton of overhead. Since column means are optimized for sparse matrices, it is possible to do cov in a smart way. The idea will be to expand terms in the definition of cov(X)
cov(X) = E[(X - E[X]) (X - E[X])]
= X'X - X'\mu - \mu'X + \mu'\mu
where X is still sparse, and \mu is the vector of column means. Each component of this expansion can be built around functions optimized for sparse matrices/vectors.
My implementation takes about 0.5 seconds for the sparse case. I would like to contribute it to Julia, but I am still new to the language and don't know where the function should go, nor do I know where tests/docs go.
The function right now looks like
function myspcov(X::SparseMatrixCSC) ...
It was recommended that this function go into base, instead of JuliaStats. I would love to submit a PR and become a contributor if someone can give me pointers
Hey @jeffwong, welcome! I'm glad to hear we've been able to steal you from R. 😉 It'd be great to have you contribute this functionality to Julia. Here are my recommendations:
Define your function in base/sparse/linalg.jl
Add tests for it in test/sparse/sparse.jl
Since it's an optimization rather than new functionality (since one can already compute the covariance with a sparse matrix, it's just slow), I don't think you'd need to add any extra documentation. Never hurts to document your code with comments though. 🙂
When in doubt, I recommend consulting CONTRIBUTING.md, then simply opening a PR. We can always answer specific questions and make suggestions in the PR.
Hi, I am a former R developer and have recently switched to Julia. I wanted to get help on this issue and propose a solution:
I would like to propose an optimized function for cov on sparse matrices. It is motivated by this example:
On the sparse matrix it takes ~ 27 seconds while on the dense matrix it takes only ~5 seconds. Coming from R, I am guessing that the sparse matrix is going through some kind of densification step, which is generating a ton of overhead. Since column means are optimized for sparse matrices, it is possible to do cov in a smart way. The idea will be to expand terms in the definition of cov(X)
cov(X) = E[(X - E[X]) (X - E[X])]
= X'X - X'\mu - \mu'X + \mu'\mu
where X is still sparse, and \mu is the vector of column means. Each component of this expansion can be built around functions optimized for sparse matrices/vectors.
My implementation takes about 0.5 seconds for the sparse case. I would like to contribute it to Julia, but I am still new to the language and don't know where the function should go, nor do I know where tests/docs go.
The function right now looks like
function myspcov(X::SparseMatrixCSC) ...
It was recommended that this function go into base, instead of JuliaStats. I would love to submit a PR and become a contributor if someone can give me pointers
Here are some references:
JuliaStats/StatsBase.jl#282
JuliaLang/julia#7788
The text was updated successfully, but these errors were encountered: