Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cor sometimes returns values > 1 #17420

Closed
maximsch2 opened this issue Jul 14, 2016 · 7 comments
Closed

Cor sometimes returns values > 1 #17420

maximsch2 opened this issue Jul 14, 2016 · 7 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior maths Mathematical functions

Comments

@maximsch2
Copy link
Contributor

julia> cor(1:100, 1:100) > 1
true

This happens due to FP precision, but it can potentially break downstream functions that rely on -1 <= cor <= 1

@andreasnoack andreasnoack added the bug Indicates an unexpected problem or unintended behavior label Jul 14, 2016
@kshyatt kshyatt added the maths Mathematical functions label Jul 14, 2016
@StefanKarpinski
Copy link
Member

We should probably throw some clamp(x, -1, 1) calls in there somewhere, but where? The machinery is somewhat complex at this point.

@andreasnoack
Copy link
Member

andreasnoack commented Jul 15, 2016

I think we can be smarter than that and I have a fix (soon). At least the fix will fix the cases where the two series are exactly identical.

The present version also doesn't vectorize so we can even get a nice speedup.

@andreasnoack andreasnoack self-assigned this Jul 15, 2016
@maximsch2
Copy link
Contributor Author

Note that sequences being identical is not required for the bug, and it happens with floats as well.

@show cor(1:100, 101:200) > 1 # true
@show cor(1:100, 2*(1:100)) > 1 # true
@show cor(linspace(1, 85, 100), linspace(1, 85, 100)) > 1 # true

@andreasnoack
Copy link
Member

These cases are also fixed with my new version but it's hard to tell if it will be that case for all possible vectors.

@maximsch2
Copy link
Contributor Author

maximsch2 commented Jul 15, 2016

I agree with Stefan, as long as you can't guarantee that the result will be in [-1, 1] (not sure how would that guarantee work without clamping anyway), putting clamp there is needed to avoid rounding issue.

Another interesting test case that produces different results (all previous ones were giving just nextfloat(1.0)):

a = linspace(1, 85, 100)
b = collect(a)
c = Vector{Float32}(b)
@show cor(a,a) # = 1.0000000000000002
@show cor(a,c) # = 1.0000000885771385
@show cor(c,c) # = 1.0000001f0

@StefanKarpinski
Copy link
Member

We should probably have both the smarter, faster algorithm and stick a clamp at the end.

@maximsch2
Copy link
Contributor Author

There is still an issue when cor goes through cov2cor. E.g.:

cor(repmat(1:17, 1, 17))[2] <= 1.0

I've made a PR to fix it by adding clamps and also a couple of tests to verify that it is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior maths Mathematical functions
Projects
None yet
Development

No branches or pull requests

4 participants