Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

Morisita Horn metric from beta diversity is not compatible with principal coordinates #1933

Closed
wasade opened this issue Feb 16, 2015 · 14 comments · Fixed by #2028
Closed

Morisita Horn metric from beta diversity is not compatible with principal coordinates #1933

wasade opened this issue Feb 16, 2015 · 14 comments · Fixed by #2028

Comments

@wasade
Copy link
Member

wasade commented Feb 16, 2015

The metric is producing distance matrices in which the diagonal is not assured to be zero'd. This violates the hollow requirement of the skbio DissimilarityMatrix.

Should all methods from beta diversity produce matrices that are compatible with principal coordinates?

Not labeling as a bug as I'm not sure if this is valid by the metric or not.

cc @clozupone, who I believe is the original implementer (method here).

@gregcaporaso
Copy link
Contributor

This is a similarity metric, so the diagonal is supposed to be non-zero. It shouldn't be a requirement of a beta_diversity.py metric that it be compatible with PCoA. There are other metrics that don't produce distance matrices that are compatible with PCoA (e.g., UniFrac Gain matrices are not symmetric).

This would really be more of a semantic type that we'd want to associate with a resulting matrix (i.e., is it based on a distance matrix). That would fit in the framework we're thinking about for QIIME 2.

Does that answer the question?

@wdwvt1
Copy link
Contributor

wdwvt1 commented Feb 16, 2015

I didn't think anything that was non metric was compatible with pcoa. gain
is non metric and I am not sure how pcoa on a gain matrix would work -
couldn't you get very misleading embedding because of the relaxed symmetry?
On Feb 16, 2015 12:28 PM, "Greg Caporaso" notifications@github.com wrote:

This is a similarity metric, so the diagonal is supposed to be non-zero.
It shouldn't be a requirement of a beta_diversity.py metric that it be
compatible with PCoA. There are other metrics that don't produce distance
matrices that are compatible with PCoA (e.g., UniFrac Gain matrices are not
symmetric).

This would really be more of a semantic type that we'd want to associate
with a resulting matrix (i.e., is it based on a distance matrix). That
would fit in the framework we're thinking about for QIIME 2.

Does that answer the question?


Reply to this email directly or view it on GitHub
#1933 (comment).

@wasade
Copy link
Member Author

wasade commented Feb 16, 2015

@gregcaporaso, the values are for the most part 0 along the diagonal from
what I saw but some there are some that are near zero that are throwing
things off. I was surprised to see the zeros given the description of the
method

That does answer the question though, thanks!

On Mon, Feb 16, 2015 at 1:45 PM, Will Van Treuren notifications@github.com
wrote:

I didn't think anything that was non metric was compatible with pcoa. gain
is non metric and I am not sure how pcoa on a gain matrix would work -
couldn't you get very misleading embedding because of the relaxed symmetry?

On Feb 16, 2015 12:28 PM, "Greg Caporaso" notifications@github.com
wrote:

This is a similarity metric, so the diagonal is supposed to be non-zero.
It shouldn't be a requirement of a beta_diversity.py metric that it be
compatible with PCoA. There are other metrics that don't produce distance
matrices that are compatible with PCoA (e.g., UniFrac Gain matrices are
not
symmetric).

This would really be more of a semantic type that we'd want to associate
with a resulting matrix (i.e., is it based on a distance matrix). That
would fit in the framework we're thinking about for QIIME 2.

Does that answer the question?


Reply to this email directly or view it on GitHub
#1933 (comment).


Reply to this email directly or view it on GitHub
#1933 (comment).

@wasade
Copy link
Member Author

wasade commented Feb 16, 2015

Morisita Horn was used in PCoA in figure 4 here, @justin212k, able to comment by chance?

@gregcaporaso
Copy link
Contributor

I am not sure how pcoa on a gain matrix would work

@wdwvt1, that's what I said:

There are other metrics that don't produce distance matrices that are compatible with PCoA (e.g., UniFrac Gain matrices are not symmetric).

@wdwvt1
Copy link
Contributor

wdwvt1 commented Feb 16, 2015

@gregcaporaso - I parsed the sentence incorrectly - I read it as there are
other metrics that don't produce distance matrices, but that happen to be
compatible with PCoA. apologies.

On Mon, Feb 16, 2015 at 12:53 PM, Greg Caporaso notifications@github.com
wrote:

I am not sure how pcoa on a gain matrix would work

@wdwvt1 https://github.com/wdwvt1, that's what I said:

There are other metrics that don't produce distance matrices that are
compatible with PCoA (e.g., UniFrac Gain matrices are not symmetric).


Reply to this email directly or view it on GitHub
#1933 (comment).

@gregcaporaso
Copy link
Contributor

No problem @wdwvt1, that's what I figured.

@wasade
Copy link
Member Author

wasade commented Feb 17, 2015

I guess to rephrase the issue given the thread so far: Morisita Horn used to work with PCoA, but now it does not. Is this a bug?

@justin212k
Copy link
Contributor

Yes, I think it is a bug - I think we're using the definition 1- C_H ( here ). Should be zero on the diagonal and thus usable with PCoA.

@wasade wasade added the bug label Feb 17, 2015
@wasade
Copy link
Member Author

wasade commented Feb 17, 2015

Thanks, @justin212k. The source for the metric is in pycogent, so it may be the case that the origin of the bug is there but labeling as a bug in qiime for the time being as this is where the issue has been noticed.

@jairideout jairideout added this to the QIIME 1.9.1 milestone Apr 17, 2015
@jairideout jairideout self-assigned this May 1, 2015
@jairideout
Copy link
Member

@wasade do you have the input files handy that produce this error? @gregcaporaso and I are going to look into fixing this for 1.9.1 and it'd be helpful to test using your data.

@wasade
Copy link
Member Author

wasade commented May 4, 2015

I don't recall what data were being used, likely some PICRUSt output. I suggest just trying any OTU table you have on hand, which given the nature of the issue, I suspect will trigger it. If not, I'll dig something back up

@gregcaporaso
Copy link
Contributor

@wasade, we tried with four different tables and aren't able to reproduce. Would you be able to dig a little for the files that generated this for you?

Two test tables we tried with were this and this. We also tried with two non-test tables that we have locally (88 soils, and another soil meta-analysis) and couldn't reproduce with those either.

@wasade
Copy link
Member Author

wasade commented May 4, 2015

Providing a link via email in a second. The distance matrix was produced by qiime 1.9. The full command executed was:

echo "cd `pwd`; source ~/.bash_profile;workon qiime-1.9; parallel_beta_diversity.py -i HMPv35_closedref_gg138/otu_table_pred_l3_even325k.biom -o HMPv35_closedref_gg138/bdiv -m bray_curtis,morisita_horn -O 50" | qsub -o sub.oe -e sub.oe -N sub_w4cmIk -q route -l nodes=1:ppn=1

Here is the result of trying to load the resulting distance matrix in to a DistanceMatrix object

10:02:10 (mcdonadt@pando-3):/Users/mcdonadt/AGBT/HMPv35_closedref_gg138/bdiv
> from skbio import read, DistanceMatrix

10:02:26 (mcdonadt@pando-3):/Users/mcdonadt/AGBT/HMPv35_closedref_gg138/bdiv
> dm = read('morisita_horn_otu_table_pred_even325k.txt', into=DistanceMatrix)
---------------------------------------------------------------------------
DissimilarityMatrixError                  Traceback (most recent call last)
<ipython-input-2-f4c77a34f1fb> in <module>()
----> 1 dm = read('morisita_horn_otu_table_pred_even325k.txt', into=DistanceMatrix)

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/_registry.pyc in read(fp, format, into, verify, mode, **kwargs)
    618                                                   if into is not None
    619                                                   else 'generator'))
--> 620     return reader(fp, mode=mode, **kwargs)
    621
    622

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/_registry.pyc in wrapped_reader(fp, mode, mutate_fh, **kwargs)
    247                     for key, fh in zip(file_keys, fhs[1:]):
    248                         kwargs[key] = fh
--> 249                     return reader(fhs[0], **kwargs)
    250
    251         wrapped_reader.__doc__ = reader.__doc__

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/lsmat.pyc in _lsmat_to_distance_matrix(fh, delimiter)
    108 @register_reader('lsmat', DistanceMatrix)
    109 def _lsmat_to_distance_matrix(fh, delimiter='\t'):
--> 110     return _lsmat_to_matrix(DistanceMatrix, fh, delimiter)
    111
    112

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/lsmat.pyc in _lsmat_to_matrix(cls, fh, delimiter)
    175                                (num_ids, row_idx + 1))
    176
--> 177     return cls(data, ids)
    178
    179

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in __init__(self, data, ids)
    186         ids = tuple(ids)
    187
--> 188         self._validate(data, ids)
    189
    190         self._data = data

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in _validate(self, data, ids)
    790
    791         """
--> 792         super(DistanceMatrix, self)._validate(data, ids)
    793
    794         if (data.T != data).any():

/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in _validate(self, data, ids)
    669                                            "point values.")
    670         elif np.trace(data) != 0:
--> 671             raise DissimilarityMatrixError("Data must be hollow (i.e., the "
    672                                            "diagonal can only contain zeros).")
    673         elif num_ids != len(set(ids)):

DissimilarityMatrixError: Data must be hollow (i.e., the diagonal can only contain zeros).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants