Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback #409

Open
yanncalec opened this issue Sep 7, 2018 · 8 comments
Open

cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback #409

yanncalec opened this issue Sep 7, 2018 · 8 comments
Labels

Comments

@yanncalec
Copy link

yanncalec commented Sep 7, 2018

Let for example

x=1:100
w=StatsBase.weights(rand(100))

Then the statement

StatsBase.var(x)==StatsBase.cov(x)

is true, however the weighted variance of the two functions are (very) different, i.e.

StatsBase.var(x, w)==StatsBase.cov(x, w)

is false.

Also the interface to cov (as well as var and mean) seems incoherent: if weight is passed it won't take the keyword argument dims:

X=rand(100,10)
StatBase.cov(X, dims=1) # this works
StatBase.cov(X, w, 1) # this works also
StatBase.cov(X, w, dims=1) # this fails
@yanncalec yanncalec changed the title var() and cov() give inconsistent values for weighted variance in Julia >= 0.6 var and cov give inconsistent values for weighted variance in Julia >= 0.6 Sep 7, 2018
@nalimilan
Copy link
Member

The apparent inconsistency between cov and var is due to a dispatch issue: w is interpreted as the second array, not as a weights vector. We need to ensure the method taking weights is more specific than the fallback.

The inconsistency with the dims arguments is #408, combined with the dispatch problem.

@nalimilan nalimilan changed the title var and cov give inconsistent values for weighted variance in Julia >= 0.6 cov(x, w::AbstractWeights) dispatches on cov(X, Y) fallback Sep 7, 2018
@ararslan
Copy link
Member

ararslan commented Sep 7, 2018

A bit tangential, but ideally I'd like to see weights specified as a keyword argument. That would require defining it in the Statistics stdlib package though.

@nalimilan
Copy link
Member

Actually I don't think there's a bug here: x is a vector, and we only provide methods for matrices. Indeed the covariance of a vector doesn't make a lot of sense. The problem is just that the call treats w as the second vector rather than as weights. The best fix to that is to move these functions to the stdlib and make weights a keyword argument.

@yanncalec
Copy link
Author

@nalimilan : Yes I can confirm cov gives correct reweighted covariance matrix when applied on matrices, and it's a bit confusing of taking a vector as input. Also it would be nice if cov can compute the reweighted covariance between two different matrices as follows

(X.-mean(X,w,1))' * ((Y.-mean(Y,w,1)) .* w) / sum(w)

@baggepinnen
Copy link

baggepinnen commented Apr 19, 2019

It is at least a big issue that this fails silently now, I just ran into it as well. Since cov is defined for a vector in Statistics, it would be nice to have cov(v,w) act similarily, or at least throw an error.

@nalimilan
Copy link
Member

Will be fixed by JuliaLang/julia#31395.

@MaximilianJHuber
Copy link

Is the above pull request a remedy for this issue too, or is is a different issue?

using Statistics, StatsBase
G = rand(10000, 4); 
StatsBase.cov(G, weights(rand(10000).^2)) # fine!
StatsBase.cov(@view(G[:,:]), weights(rand(10000).^2)) # yields only a column vector!

The @view changes behavior and yields a wrong object!

@nalimilan
Copy link
Member

That's semi-related AFAICT. The method for weighted covariance doesn't support views, so it dispatches to the method which interprets weights as a second vector. If this issue was fixed you'd get an error, which is better, but still not great.

To fix the deeper issue, you'd have to change cov to accept views of dense arrays. Maybe that's just a matter of adapting the method's signature. Feel free to make a PR or file a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants