-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ancom should report mean/std abundance of OTUs #1293
Comments
If I'm understanding this correctly. There are some preprocessing steps within import pandas as pd
x = pd.DataFrame({'a':[1, 2, 3, 4, 5, 6], 'b':[0, 0, 0, 1, 1, 1]})
pd.concat((x.groupby('b').mean().T, x.groupby('b').std().T)) But now since we are adding in additional statistics, why limit to mean and std? How about 25%, 75% quartiles, min and max? The point is, I'm not completely convinced that this sort of information should be coupled specifically with ANCOM. It may be more appropriate to have separate preprocessing module for statistics on dataframes. But I do agree, interpreting ANCOM isn't the most straightforward at the moment. That is still active research. |
Yes, the issue is that I want to make sure that the abundances that we're looking at are the same as the ones that ancom is comparing. Five-number summary, like you're suggesting, is a better idea. This will be really important for QIIME 2, so I'm putting this in the 0.5.0 milestone. |
I agree with @mortonjt that this info shouldn't be coupled with ANCOM output. For now I think it's okay to add the summary stats you guys are suggesting (since there is an immediate use-case, and it is an experimental API), but we'll need to revisit this when we have the contingency table class. I think those stats make more sense there. |
That makes sense about getting the stats from contingency table. ANCOM should get the distributions of feature abundances by group from there (they'd need to be computed there to get the stats on them), which would address my concern about making sure the summaries are of the same distributions that ANCOM is operating on. |
@gregcaporaso is this something you're still wanting for 0.5.0, and if so do you have bandwidth to work on it? |
Yes, this would be good to get in since it's small and will be important for the QIIME 2 alpha release. |
Sounds good, let me know if you need me or someone else to work on it (you're currently assigned to it). |
@mortonjt, @jairideout - is there a specific reason why we default to no multiple comparison correction? It seems like a better option would be to default to |
I don't think there's a specific reason, should be fine to change the default since it's an experimental API (just be sure to note in the changelog). |
@mortonjt, is there a p-value that would make sense to report from ANCOM? |
Nope. ANCOM has no p-values yet
|
Ok, thanks. And am I right that the input should be raw (un-rarefied) counts (not relative abundances)? That's the case in the docstring example, but just want to confirm that that's the expectation as QIIME 2 will type-check for both of those requirements. |
I don't think that should be type checked. It should not care about the The only thing that should matter is that there are no zeros, which is
|
Ok, so that'd mean no requirement for rarefied, but we specifically don't want them to be relative abundances, right? They should be counts. |
Wouldn't a composition-based method basically treat everything as relative On Tue, Jun 7, 2016 at 10:13 AM, Greg Caporaso notifications@github.com
|
Yes, but it does that conversion internally which is why we wouldn't want On Tue, Jun 7, 2016 at 10:22 AM, Evan Bolyen notifications@github.com
|
Even though the method "works" (i.e. doesn't raise an error) on rarefied input, is the researcher using the method correctly? Probably not necessary for scikit-bio to make this distinction, but we'll be tracking that kind of semantic information in QIIME 2 and can warn (at least) if the user is doing something that's likely a mistake. |
If I understand correctly, they are not using the method correctly if they pass rarefied data or relative frequency data (hence wanting to type check in QIIME 2, but agree that skbio doesn't need to check for this). |
Exactly. It shouldn't even matter if they are counts or not.
|
This information helps in interpreting results, and computing it outside of ANCOM isn't good, as it requires the user to reproduce the grouping. For example, a dataframe like this would be ideal (building on the docstring example):
I'll share some code that will help with this as I'm adapting ANCOM for an analysis that I'm working on. (See the "proof-of-concept" commit below.)
The text was updated successfully, but these errors were encountered: