Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point bug in heterozygosity_expected resulting in nans #274

Open
timothymillar opened this issue Jul 10, 2019 · 1 comment
Open

Comments

@timothymillar
Copy link
Contributor

Description

Hi, I think there's a float precision error in heterozygosity_expected caused by the comparison statement af_sum < 1 here.
Some values that are effectively 1 will return True as a result they get filled e.g.
0.99999999....

Example

import numpy as np
import allel

g = allel.GenotypeArray([
    [[0, 0], [0, 0], [0, 0]],
    [[0, 0], [0, 1], [1, 1]],
    [[0, 1], [2, 3], [4, 5]]
])

af = g.count_alleles().to_frequencies()

allel.heterozygosity_expected(af, ploidy=2)

returns

array([0. , 0.5, nan])

The expected result is

array([0.        , 0.5       , 0.83333333])

The third value is nan-filled because

af_sum = np.sum(af, axis=1)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 0.9999999999999999

This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.

Fix

I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.

precision = np.finfo(af_sum.dtype).precision
af_sum = np.round(np.sum(af, axis=1), decimals=precision)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 1.0

Though I'm not sure what case af_sum < 1 is actually checking for?

@alimanfoo
Copy link
Contributor

alimanfoo commented Jul 10, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants