Cor sometimes returns values > 1 #17420

maximsch2 · 2016-07-14T19:19:51Z

julia> cor(1:100, 1:100) > 1
true

This happens due to FP precision, but it can potentially break downstream functions that rely on -1 <= cor <= 1

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2016-07-15T18:39:19Z

We should probably throw some clamp(x, -1, 1) calls in there somewhere, but where? The machinery is somewhat complex at this point.

andreasnoack · 2016-07-15T19:04:26Z

I think we can be smarter than that and I have a fix (soon). At least the fix will fix the cases where the two series are exactly identical.

The present version also doesn't vectorize so we can even get a nice speedup.

maximsch2 · 2016-07-15T19:12:51Z

Note that sequences being identical is not required for the bug, and it happens with floats as well.

@show cor(1:100, 101:200) > 1 # true
@show cor(1:100, 2*(1:100)) > 1 # true
@show cor(linspace(1, 85, 100), linspace(1, 85, 100)) > 1 # true

andreasnoack · 2016-07-15T19:16:40Z

These cases are also fixed with my new version but it's hard to tell if it will be that case for all possible vectors.

maximsch2 · 2016-07-15T19:27:53Z

I agree with Stefan, as long as you can't guarantee that the result will be in [-1, 1] (not sure how would that guarantee work without clamping anyway), putting clamp there is needed to avoid rounding issue.

Another interesting test case that produces different results (all previous ones were giving just nextfloat(1.0)):

a = linspace(1, 85, 100)
b = collect(a)
c = Vector{Float32}(b)
@show cor(a,a) # = 1.0000000000000002
@show cor(a,c) # = 1.0000000885771385
@show cor(c,c) # = 1.0000001f0

StefanKarpinski · 2016-07-15T19:29:02Z

We should probably have both the smarter, faster algorithm and stick a clamp at the end.

maximsch2 · 2016-07-25T19:11:55Z

There is still an issue when cor goes through cov2cor. E.g.:

cor(repmat(1:17, 1, 17))[2] <= 1.0

I've made a PR to fix it by adding clamps and also a couple of tests to verify that it is working.

andreasnoack added the bug Indicates an unexpected problem or unintended behavior label Jul 14, 2016

kshyatt added the maths Mathematical functions label Jul 14, 2016

andreasnoack self-assigned this Jul 15, 2016

andreasnoack mentioned this issue Jul 17, 2016

Avoid that cor(x,x) != 1... #17464

Merged

simonbyrne closed this as completed in #17464 Jul 25, 2016

maximsch2 mentioned this issue Jul 25, 2016

Fix correlation sometimes being > 1 #17617

Merged

Liozou mentioned this issue Jun 7, 2019

Statistics.cor throws domain error with specific data #32264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cor sometimes returns values > 1 #17420

Cor sometimes returns values > 1 #17420

maximsch2 commented Jul 14, 2016

StefanKarpinski commented Jul 15, 2016

andreasnoack commented Jul 15, 2016 •

edited

Loading

maximsch2 commented Jul 15, 2016

andreasnoack commented Jul 15, 2016

maximsch2 commented Jul 15, 2016 •

edited

Loading

StefanKarpinski commented Jul 15, 2016

maximsch2 commented Jul 25, 2016

Cor sometimes returns values > 1 #17420

Cor sometimes returns values > 1 #17420

Comments

maximsch2 commented Jul 14, 2016

StefanKarpinski commented Jul 15, 2016

andreasnoack commented Jul 15, 2016 • edited Loading

maximsch2 commented Jul 15, 2016

andreasnoack commented Jul 15, 2016

maximsch2 commented Jul 15, 2016 • edited Loading

StefanKarpinski commented Jul 15, 2016

maximsch2 commented Jul 25, 2016

andreasnoack commented Jul 15, 2016 •

edited

Loading

maximsch2 commented Jul 15, 2016 •

edited

Loading