-
Notifications
You must be signed in to change notification settings - Fork 265
Morisita Horn metric from beta diversity is not compatible with principal coordinates #1933
Comments
This is a similarity metric, so the diagonal is supposed to be non-zero. It shouldn't be a requirement of a This would really be more of a semantic type that we'd want to associate with a resulting matrix (i.e., is it based on a distance matrix). That would fit in the framework we're thinking about for QIIME 2. Does that answer the question? |
I didn't think anything that was non metric was compatible with pcoa. gain
|
@gregcaporaso, the values are for the most part 0 along the diagonal from That does answer the question though, thanks! On Mon, Feb 16, 2015 at 1:45 PM, Will Van Treuren notifications@github.com
|
Morisita Horn was used in PCoA in figure 4 here, @justin212k, able to comment by chance? |
@wdwvt1, that's what I said:
|
@gregcaporaso - I parsed the sentence incorrectly - I read it as there are On Mon, Feb 16, 2015 at 12:53 PM, Greg Caporaso notifications@github.com
|
No problem @wdwvt1, that's what I figured. |
I guess to rephrase the issue given the thread so far: Morisita Horn used to work with PCoA, but now it does not. Is this a bug? |
Yes, I think it is a bug - I think we're using the definition 1- C_H ( here ). Should be zero on the diagonal and thus usable with PCoA. |
Thanks, @justin212k. The source for the metric is in pycogent, so it may be the case that the origin of the bug is there but labeling as a bug in qiime for the time being as this is where the issue has been noticed. |
@wasade do you have the input files handy that produce this error? @gregcaporaso and I are going to look into fixing this for 1.9.1 and it'd be helpful to test using your data. |
I don't recall what data were being used, likely some PICRUSt output. I suggest just trying any OTU table you have on hand, which given the nature of the issue, I suspect will trigger it. If not, I'll dig something back up |
@wasade, we tried with four different tables and aren't able to reproduce. Would you be able to dig a little for the files that generated this for you? Two test tables we tried with were this and this. We also tried with two non-test tables that we have locally (88 soils, and another soil meta-analysis) and couldn't reproduce with those either. |
Providing a link via email in a second. The distance matrix was produced by qiime 1.9. The full command executed was: echo "cd `pwd`; source ~/.bash_profile;workon qiime-1.9; parallel_beta_diversity.py -i HMPv35_closedref_gg138/otu_table_pred_l3_even325k.biom -o HMPv35_closedref_gg138/bdiv -m bray_curtis,morisita_horn -O 50" | qsub -o sub.oe -e sub.oe -N sub_w4cmIk -q route -l nodes=1:ppn=1 Here is the result of trying to load the resulting distance matrix in to a 10:02:10 (mcdonadt@pando-3):/Users/mcdonadt/AGBT/HMPv35_closedref_gg138/bdiv
> from skbio import read, DistanceMatrix
10:02:26 (mcdonadt@pando-3):/Users/mcdonadt/AGBT/HMPv35_closedref_gg138/bdiv
> dm = read('morisita_horn_otu_table_pred_even325k.txt', into=DistanceMatrix)
---------------------------------------------------------------------------
DissimilarityMatrixError Traceback (most recent call last)
<ipython-input-2-f4c77a34f1fb> in <module>()
----> 1 dm = read('morisita_horn_otu_table_pred_even325k.txt', into=DistanceMatrix)
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/_registry.pyc in read(fp, format, into, verify, mode, **kwargs)
618 if into is not None
619 else 'generator'))
--> 620 return reader(fp, mode=mode, **kwargs)
621
622
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/_registry.pyc in wrapped_reader(fp, mode, mutate_fh, **kwargs)
247 for key, fh in zip(file_keys, fhs[1:]):
248 kwargs[key] = fh
--> 249 return reader(fhs[0], **kwargs)
250
251 wrapped_reader.__doc__ = reader.__doc__
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/lsmat.pyc in _lsmat_to_distance_matrix(fh, delimiter)
108 @register_reader('lsmat', DistanceMatrix)
109 def _lsmat_to_distance_matrix(fh, delimiter='\t'):
--> 110 return _lsmat_to_matrix(DistanceMatrix, fh, delimiter)
111
112
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/io/lsmat.pyc in _lsmat_to_matrix(cls, fh, delimiter)
175 (num_ids, row_idx + 1))
176
--> 177 return cls(data, ids)
178
179
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in __init__(self, data, ids)
186 ids = tuple(ids)
187
--> 188 self._validate(data, ids)
189
190 self._data = data
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in _validate(self, data, ids)
790
791 """
--> 792 super(DistanceMatrix, self)._validate(data, ids)
793
794 if (data.T != data).any():
/Users/mcdonadt/.virtualenvs/qiime-1.9/lib/python2.7/site-packages/skbio/stats/distance/_base.pyc in _validate(self, data, ids)
669 "point values.")
670 elif np.trace(data) != 0:
--> 671 raise DissimilarityMatrixError("Data must be hollow (i.e., the "
672 "diagonal can only contain zeros).")
673 elif num_ids != len(set(ids)):
DissimilarityMatrixError: Data must be hollow (i.e., the diagonal can only contain zeros). |
The metric is producing distance matrices in which the diagonal is not assured to be zero'd. This violates the hollow requirement of the skbio
DissimilarityMatrix
.Should all methods from beta diversity produce matrices that are compatible with principal coordinates?
Not labeling as a bug as I'm not sure if this is valid by the metric or not.
cc @clozupone, who I believe is the original implementer (method here).
The text was updated successfully, but these errors were encountered: