-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point bug in heterozygosity_expected resulting in nans #274
Comments
Hi Tim, thanks a lot for looking into this and for submitting a PR, much
appreciated. I haven't looked at that piece of code for some time but will
refresh my memory and take a look at the PR as soon as I can.
…On Wed, 10 Jul 2019, 15:21 Tim Millar, ***@***.***> wrote:
Description
Hi, I think there's a float precision error in heterozygosity_expected
caused by the comparison statement af_sum < 1 here
<https://github.com/cggh/scikit-allel/blob/master/allel/stats/hw.py#L98>.
Some values that are effectively 1 will return True as a result they get
filled e.g.
0.99999999....
Example
import numpy as np
import allel
g = allel.GenotypeArray([
[[0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 1], [1, 1]],
[[0, 1], [2, 3], [4, 5]]
])
af = g.count_alleles().to_frequencies()
allel.heterozygosity_expected(af, ploidy=2)
returns
array([0. , 0.5, nan])
The expected result is
array([0. , 0.5 , 0.83333333])
The third value is nan-filled because
af_sum = np.sum(af, axis=1)
print(af_sum[0], af_sum[1], af_sum[2])
1.0 1.0 0.9999999999999999
This actually occurs in the polyploid test case but the same check is used
in the reference implementation in the test suit.
Fix
I've got a PR ready to fix it by rounding to a suitable precision based on
the dtype.
precision = np.finfo(af_sum.dtype).precision
af_sum = np.round(np.sum(af, axis=1), decimals=precision)
print(af_sum[0], af_sum[1], af_sum[2])
1.0 1.0 1.0
Though I'm not sure what case af_sum < 1 is actually checking for?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#274?email_source=notifications&email_token=AAFLYQWD6ZKAJOQZMGOK5ALP6ZAF5A5CNFSM4H7VIL5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6OHQGQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFLYQUGB3GLZTNMVGTFCEDP6ZAF5ANCNFSM4H7VIL5A>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Hi, I think there's a float precision error in
heterozygosity_expected
caused by the comparison statementaf_sum < 1
here.Some values that are effectively 1 will return
True
as a result they get filled e.g.0.99999999...
.Example
returns
The expected result is
The third value is nan-filled because
1.0 1.0 0.9999999999999999
This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.
Fix
I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.
1.0 1.0 1.0
Though I'm not sure what case
af_sum < 1
is actually checking for?The text was updated successfully, but these errors were encountered: