Floating point bug in heterozygosity_expected resulting in nans #274

timothymillar · 2019-07-10T20:21:50Z

Description

Hi, I think there's a float precision error in heterozygosity_expected caused by the comparison statement af_sum < 1 here.
Some values that are effectively 1 will return True as a result they get filled e.g.
0.99999999....

Example

import numpy as np
import allel

g = allel.GenotypeArray([
    [[0, 0], [0, 0], [0, 0]],
    [[0, 0], [0, 1], [1, 1]],
    [[0, 1], [2, 3], [4, 5]]
])

af = g.count_alleles().to_frequencies()

allel.heterozygosity_expected(af, ploidy=2)

returns

array([0. , 0.5, nan])

The expected result is

array([0.        , 0.5       , 0.83333333])

The third value is nan-filled because

af_sum = np.sum(af, axis=1)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 0.9999999999999999

This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.

Fix

I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.

precision = np.finfo(af_sum.dtype).precision
af_sum = np.round(np.sum(af, axis=1), decimals=precision)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 1.0

Though I'm not sure what case af_sum < 1 is actually checking for?

The text was updated successfully, but these errors were encountered:

alimanfoo · 2019-07-10T20:56:25Z

Hi Tim, thanks a lot for looking into this and for submitting a PR, much appreciated. I haven't looked at that piece of code for some time but will refresh my memory and take a look at the PR as soon as I can.

…

On Wed, 10 Jul 2019, 15:21 Tim Millar, ***@***.***> wrote: Description Hi, I think there's a float precision error in heterozygosity_expected caused by the comparison statement af_sum < 1 here <https://github.com/cggh/scikit-allel/blob/master/allel/stats/hw.py#L98>. Some values that are effectively 1 will return True as a result they get filled e.g. 0.99999999.... Example import numpy as np import allel g = allel.GenotypeArray([ [[0, 0], [0, 0], [0, 0]], [[0, 0], [0, 1], [1, 1]], [[0, 1], [2, 3], [4, 5]] ]) af = g.count_alleles().to_frequencies() allel.heterozygosity_expected(af, ploidy=2) returns array([0. , 0.5, nan]) The expected result is array([0. , 0.5 , 0.83333333]) The third value is nan-filled because af_sum = np.sum(af, axis=1) print(af_sum[0], af_sum[1], af_sum[2]) 1.0 1.0 0.9999999999999999 This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit. Fix I've got a PR ready to fix it by rounding to a suitable precision based on the dtype. precision = np.finfo(af_sum.dtype).precision af_sum = np.round(np.sum(af, axis=1), decimals=precision) print(af_sum[0], af_sum[1], af_sum[2]) 1.0 1.0 1.0 Though I'm not sure what case af_sum < 1 is actually checking for? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#274?email_source=notifications&email_token=AAFLYQWD6ZKAJOQZMGOK5ALP6ZAF5A5CNFSM4H7VIL5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6OHQGQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFLYQUGB3GLZTNMVGTFCEDP6ZAF5ANCNFSM4H7VIL5A> .

timothymillar mentioned this issue Jul 10, 2019

Fix for floating point bug in heterozygosity_expected #274 #275

Open

timothymillar mentioned this issue Sep 2, 2020

Not the typical expected heterozygosity computation #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point bug in heterozygosity_expected resulting in nans #274

Floating point bug in heterozygosity_expected resulting in nans #274

timothymillar commented Jul 10, 2019

alimanfoo commented Jul 10, 2019 via email

Floating point bug in heterozygosity_expected resulting in nans #274

Floating point bug in heterozygosity_expected resulting in nans #274

Comments

timothymillar commented Jul 10, 2019

Description

Example

Fix

alimanfoo commented Jul 10, 2019 via email