ancom should report mean/std abundance of OTUs #1293

gregcaporaso · 2016-02-26T21:14:35Z

This information helps in interpreting results, and computing it outside of ANCOM isn't good, as it requires the user to reproduce the grouping. For example, a dataframe like this would be ideal (building on the docstring example):

I'll share some code that will help with this as I'm adapting ANCOM for an analysis that I'm working on. (See the "proof-of-concept" commit below.)

mortonjt · 2016-03-01T05:49:11Z

If I'm understanding this correctly.

There are some preprocessing steps within ancom that you want to take advantage of correct?
Otherwise you could just do something like

import pandas as pd
x = pd.DataFrame({'a':[1, 2, 3, 4, 5, 6], 'b':[0, 0, 0, 1, 1, 1]})
pd.concat((x.groupby('b').mean().T, x.groupby('b').std().T))

But now since we are adding in additional statistics, why limit to mean and std? How about 25%, 75% quartiles, min and max?

The point is, I'm not completely convinced that this sort of information should be coupled specifically with ANCOM. It may be more appropriate to have separate preprocessing module for statistics on dataframes.

But I do agree, interpreting ANCOM isn't the most straightforward at the moment. That is still active research.

gregcaporaso · 2016-03-01T12:30:29Z

Yes, the issue is that I want to make sure that the abundances that we're looking at are the same as the ones that ancom is comparing. Five-number summary, like you're suggesting, is a better idea.

This will be really important for QIIME 2, so I'm putting this in the 0.5.0 milestone.

jairideout · 2016-03-01T16:41:24Z

I agree with @mortonjt that this info shouldn't be coupled with ANCOM output. For now I think it's okay to add the summary stats you guys are suggesting (since there is an immediate use-case, and it is an experimental API), but we'll need to revisit this when we have the contingency table class. I think those stats make more sense there.

gregcaporaso · 2016-03-01T18:24:20Z

That makes sense about getting the stats from contingency table. ANCOM should get the distributions of feature abundances by group from there (they'd need to be computed there to get the stats on them), which would address my concern about making sure the summaries are of the same distributions that ANCOM is operating on.

jairideout · 2016-06-01T20:26:54Z

@gregcaporaso is this something you're still wanting for 0.5.0, and if so do you have bandwidth to work on it?

gregcaporaso · 2016-06-02T12:44:47Z

Yes, this would be good to get in since it's small and will be important for the QIIME 2 alpha release.

jairideout · 2016-06-02T17:38:15Z

Sounds good, let me know if you need me or someone else to work on it (you're currently assigned to it).

gregcaporaso · 2016-06-07T15:44:19Z

@mortonjt, @jairideout - is there a specific reason why we default to no multiple comparison correction? It seems like a better option would be to default to holm-bonferroni (which is currently the only option other than None).

jairideout · 2016-06-07T16:22:41Z

I don't think there's a specific reason, should be fine to change the default since it's an experimental API (just be sure to note in the changelog).

gregcaporaso · 2016-06-07T16:51:25Z

@mortonjt, is there a p-value that would make sense to report from ANCOM?

mortonjt · 2016-06-07T16:53:11Z

Nope. ANCOM has no p-values yet
On Jun 7, 2016 9:51 AM, "Greg Caporaso" notifications@github.com wrote:

@mortonjt https://github.com/mortonjt, is there a p-value that would
make sense to report from ANCOM?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3W1icQyK7ktdUiXbQFHZ1UojhNidks5qJaGPgaJpZM4HkLRW
.

gregcaporaso · 2016-06-07T17:00:21Z

Ok, thanks. And am I right that the input should be raw (un-rarefied) counts (not relative abundances)? That's the case in the docstring example, but just want to confirm that that's the expectation as QIIME 2 will type-check for both of those requirements.

mortonjt · 2016-06-07T17:04:19Z

I don't think that should be type checked. It should not care about the
sequencing depth. Rarefied vs not rarefied shouldn't matter.

The only thing that should matter is that there are no zeros, which is
currently being checked.
On Jun 7, 2016 10:00 AM, "Greg Caporaso" notifications@github.com wrote:

Ok, thanks. And am I right that the input should be raw (un-rarefied)
counts (not relative abundances)? That's the case in the docstring example,
but just want to confirm that that's the expectation as QIIME 2 will
type-check for both of those requirements.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3dJ6g3QQh-viDhGZ-fLSozpeHojbks5qJaOngaJpZM4HkLRW
.

gregcaporaso · 2016-06-07T17:13:35Z

Ok, so that'd mean no requirement for rarefied, but we specifically don't want them to be relative abundances, right? They should be counts.

ebolyen · 2016-06-07T17:22:09Z

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW
.

gregcaporaso · 2016-06-07T17:23:54Z

Yes, but it does that conversion internally which is why we wouldn't want
to pass it relative abundance data.

On Tue, Jun 7, 2016 at 10:22 AM, Evan Bolyen notifications@github.com
wrote:

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/biocore/scikit-bio/issues/1293#issuecomment-224349376>,
or mute the thread
<
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW

.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AALvdE2r2PxIvWJ_iSJ4xCjSV9_T_vt6ks5qJajBgaJpZM4HkLRW
.

jairideout · 2016-06-07T17:35:26Z

Even though the method "works" (i.e. doesn't raise an error) on rarefied input, is the researcher using the method correctly? Probably not necessary for scikit-bio to make this distinction, but we'll be tracking that kind of semantic information in QIIME 2 and can warn (at least) if the user is doing something that's likely a mistake.

gregcaporaso · 2016-06-07T17:37:12Z

If I understand correctly, they are not using the method correctly if they pass rarefied data or relative frequency data (hence wanting to type check in QIIME 2, but agree that skbio doesn't need to check for this).

mortonjt · 2016-06-07T17:39:49Z

Exactly. It shouldn't even matter if they are counts or not.
On Jun 7, 2016 10:22 AM, "Evan Bolyen" notifications@github.com wrote:

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/biocore/scikit-bio/issues/1293#issuecomment-224349376>,
or mute the thread
<
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW

.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3eMl64drQ_P5n4-UfH6uFiVR3LI0ks5qJajDgaJpZM4HkLRW
.

fixes scikit-bio#1293

fixes scikit-bio#1293 fixes scikit-bio#1375

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Feb 26, 2016

ENH: proof-of-concept for scikit-bio#1293

698f072

jairideout added API enhancement labels Feb 27, 2016

gregcaporaso added this to the 0.5.0: Python 3 support only milestone Mar 1, 2016

gregcaporaso self-assigned this Jun 2, 2016

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Jun 7, 2016

MAINT/API: percentile abundances now returned in separate dataframe

2792afb

fixes scikit-bio#1293

gregcaporaso mentioned this issue Jun 7, 2016

ANCOM returns percentile abundances of each feature in each group #1374

Merged

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Jun 8, 2016

MAINT/API: percentile abundances now returned in separate dataframe

b0f13f4

fixes scikit-bio#1293

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Jun 8, 2016

MAINT/API: percentile abundances now returned in separate dataframe

e4f8c0e

fixes scikit-bio#1293 fixes scikit-bio#1375

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Jun 8, 2016

MAINT/API: percentile abundances now returned in separate dataframe

561560d

fixes scikit-bio#1293 fixes scikit-bio#1375

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Jun 9, 2016

MAINT/API: percentile abundances now returned in separate dataframe

1ab9eba

fixes scikit-bio#1293 fixes scikit-bio#1375

jairideout closed this as completed in #1374 Jun 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ancom should report mean/std abundance of OTUs #1293

ancom should report mean/std abundance of OTUs #1293

gregcaporaso commented Feb 26, 2016

mortonjt commented Mar 1, 2016

gregcaporaso commented Mar 1, 2016

jairideout commented Mar 1, 2016

gregcaporaso commented Mar 1, 2016

jairideout commented Jun 1, 2016

gregcaporaso commented Jun 2, 2016

jairideout commented Jun 2, 2016

gregcaporaso commented Jun 7, 2016

jairideout commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

ebolyen commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

jairideout commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016

ancom should report mean/std abundance of OTUs #1293

ancom should report mean/std abundance of OTUs #1293

Comments

gregcaporaso commented Feb 26, 2016

mortonjt commented Mar 1, 2016

gregcaporaso commented Mar 1, 2016

jairideout commented Mar 1, 2016

gregcaporaso commented Mar 1, 2016

jairideout commented Jun 1, 2016

gregcaporaso commented Jun 2, 2016

jairideout commented Jun 2, 2016

gregcaporaso commented Jun 7, 2016

jairideout commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

ebolyen commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

jairideout commented Jun 7, 2016

gregcaporaso commented Jun 7, 2016

mortonjt commented Jun 7, 2016