Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ancom should report mean/std abundance of OTUs #1293

Closed
gregcaporaso opened this issue Feb 26, 2016 · 19 comments · Fixed by #1374
Closed

ancom should report mean/std abundance of OTUs #1293

gregcaporaso opened this issue Feb 26, 2016 · 19 comments · Fixed by #1374

Comments

@gregcaporaso
Copy link
Contributor

This information helps in interpreting results, and computing it outside of ANCOM isn't good, as it requires the user to reproduce the grouping. For example, a dataframe like this would be ideal (building on the docstring example):

screenshot 2016-02-26 14 13 47

I'll share some code that will help with this as I'm adapting ANCOM for an analysis that I'm working on. (See the "proof-of-concept" commit below.)

gregcaporaso added a commit to gregcaporaso/scikit-bio that referenced this issue Feb 26, 2016
@mortonjt
Copy link
Collaborator

mortonjt commented Mar 1, 2016

If I'm understanding this correctly.

There are some preprocessing steps within ancom that you want to take advantage of correct?
Otherwise you could just do something like

import pandas as pd
x = pd.DataFrame({'a':[1, 2, 3, 4, 5, 6], 'b':[0, 0, 0, 1, 1, 1]})
pd.concat((x.groupby('b').mean().T, x.groupby('b').std().T))

But now since we are adding in additional statistics, why limit to mean and std? How about 25%, 75% quartiles, min and max?

The point is, I'm not completely convinced that this sort of information should be coupled specifically with ANCOM. It may be more appropriate to have separate preprocessing module for statistics on dataframes.

But I do agree, interpreting ANCOM isn't the most straightforward at the moment. That is still active research.

@gregcaporaso
Copy link
Contributor Author

Yes, the issue is that I want to make sure that the abundances that we're looking at are the same as the ones that ancom is comparing. Five-number summary, like you're suggesting, is a better idea.

This will be really important for QIIME 2, so I'm putting this in the 0.5.0 milestone.

@jairideout
Copy link
Contributor

I agree with @mortonjt that this info shouldn't be coupled with ANCOM output. For now I think it's okay to add the summary stats you guys are suggesting (since there is an immediate use-case, and it is an experimental API), but we'll need to revisit this when we have the contingency table class. I think those stats make more sense there.

@gregcaporaso
Copy link
Contributor Author

That makes sense about getting the stats from contingency table. ANCOM should get the distributions of feature abundances by group from there (they'd need to be computed there to get the stats on them), which would address my concern about making sure the summaries are of the same distributions that ANCOM is operating on.

@jairideout
Copy link
Contributor

@gregcaporaso is this something you're still wanting for 0.5.0, and if so do you have bandwidth to work on it?

@gregcaporaso gregcaporaso self-assigned this Jun 2, 2016
@gregcaporaso
Copy link
Contributor Author

Yes, this would be good to get in since it's small and will be important for the QIIME 2 alpha release.

@jairideout
Copy link
Contributor

Sounds good, let me know if you need me or someone else to work on it (you're currently assigned to it).

@gregcaporaso
Copy link
Contributor Author

@mortonjt, @jairideout - is there a specific reason why we default to no multiple comparison correction? It seems like a better option would be to default to holm-bonferroni (which is currently the only option other than None).

@jairideout
Copy link
Contributor

I don't think there's a specific reason, should be fine to change the default since it's an experimental API (just be sure to note in the changelog).

@gregcaporaso
Copy link
Contributor Author

@mortonjt, is there a p-value that would make sense to report from ANCOM?

@mortonjt
Copy link
Collaborator

mortonjt commented Jun 7, 2016

Nope. ANCOM has no p-values yet
On Jun 7, 2016 9:51 AM, "Greg Caporaso" notifications@github.com wrote:

@mortonjt https://github.com/mortonjt, is there a p-value that would
make sense to report from ANCOM?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3W1icQyK7ktdUiXbQFHZ1UojhNidks5qJaGPgaJpZM4HkLRW
.

@gregcaporaso
Copy link
Contributor Author

Ok, thanks. And am I right that the input should be raw (un-rarefied) counts (not relative abundances)? That's the case in the docstring example, but just want to confirm that that's the expectation as QIIME 2 will type-check for both of those requirements.

@mortonjt
Copy link
Collaborator

mortonjt commented Jun 7, 2016

I don't think that should be type checked. It should not care about the
sequencing depth. Rarefied vs not rarefied shouldn't matter.

The only thing that should matter is that there are no zeros, which is
currently being checked.
On Jun 7, 2016 10:00 AM, "Greg Caporaso" notifications@github.com wrote:

Ok, thanks. And am I right that the input should be raw (un-rarefied)
counts (not relative abundances)? That's the case in the docstring example,
but just want to confirm that that's the expectation as QIIME 2 will
type-check for both of those requirements.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3dJ6g3QQh-viDhGZ-fLSozpeHojbks5qJaOngaJpZM4HkLRW
.

@gregcaporaso
Copy link
Contributor Author

Ok, so that'd mean no requirement for rarefied, but we specifically don't want them to be relative abundances, right? They should be counts.

@ebolyen
Copy link
Contributor

ebolyen commented Jun 7, 2016

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW
.

@gregcaporaso
Copy link
Contributor Author

Yes, but it does that conversion internally which is why we wouldn't want
to pass it relative abundance data.

On Tue, Jun 7, 2016 at 10:22 AM, Evan Bolyen notifications@github.com
wrote:

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/biocore/scikit-bio/issues/1293#issuecomment-224349376>,
or mute the thread
<
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW

.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AALvdE2r2PxIvWJ_iSJ4xCjSV9_T_vt6ks5qJajBgaJpZM4HkLRW
.

@jairideout
Copy link
Contributor

Even though the method "works" (i.e. doesn't raise an error) on rarefied input, is the researcher using the method correctly? Probably not necessary for scikit-bio to make this distinction, but we'll be tracking that kind of semantic information in QIIME 2 and can warn (at least) if the user is doing something that's likely a mistake.

@gregcaporaso
Copy link
Contributor Author

If I understand correctly, they are not using the method correctly if they pass rarefied data or relative frequency data (hence wanting to type check in QIIME 2, but agree that skbio doesn't need to check for this).

@mortonjt
Copy link
Collaborator

mortonjt commented Jun 7, 2016

Exactly. It shouldn't even matter if they are counts or not.
On Jun 7, 2016 10:22 AM, "Evan Bolyen" notifications@github.com wrote:

Wouldn't a composition-based method basically treat everything as relative
(relative is just the unit simplex right)?

On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
wrote:

Ok, so that'd mean no requirement for rarefied, but we specifically don't
want them to be relative abundances, right? They should be counts.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/biocore/scikit-bio/issues/1293#issuecomment-224349376>,
or mute the thread
<
https://github.com/notifications/unsubscribe/ADyuZATIm0qcjIfakyO2hB1uQkbcZiu5ks5qJabBgaJpZM4HkLRW

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1293 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AD_a3eMl64drQ_P5n4-UfH6uFiVR3LI0ks5qJajDgaJpZM4HkLRW
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants