-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on Converting MAF to EAF Using MungeSumstats #189
Comments
Hey Daxuan, Issues around minor allele frequencies (MAF) and effect allele frequencies (EAF) are quite common in the field down to differing of definitions. This comes back to the mess around differing meanings behind alleles such as A1/A2, effect/non-effect and minor/major alleles. We try to handle all cases in MSS but since this can be quite ambiguous, there are cases where we have to assume the effect and frequency columns relate to the minor allele (i.e. not the allele on the reference genome) to enable a standardised formatting. See infer_effect_column and my answer here for more info on this. I'll try my best to clarify the problem and will explain how MSS deals with it. Overall though, it is good to set up some test summary stats to check if MSS behaves as you expect (as I do below) or to consult our documentation and related publication.
Yes, so we actually align our output with both in that we process so:
It depends, the defaults are sensible but the other parameter choices are there to make it flexible to the users needs without compromising the standardisation principles of MSS. Have a look at the parameters for format_sumstats function and feel free to ask a specific use case but I can't really answer such a vague question without more input from you on what you are trying to do.
MSS has quite a robust approach to figuring out the effect from the non-effect allele with multiple checks to interpret the column headers inputted and also using reference genomes. Read through our documentation and come back to me with any specific questions.
So there are quite a few things going on in your question here. Firstly, how would you know to convert MAF to EAF? The minor allele is more commonly the effect allele - think in a GWAS you would be measuring the effect of the SNP that isn't on the reference genome. I know it can happen that this isn't the case though which MSS will try to infer as discussed in my first paragraph. Secondly, you can only change the noted MAF by flipping it which should only be done if the SNP is Bi-allelic i.e. there is only one known SNP in the population so it can be flipped with 1-MAF. Note, MSS does check for this and will flip MAFs at a SNP level if they look different to the rest in the sumstats. Hzve a look at our documentation for more info on this. And just an example of how to test how MSS would handle certain cases. Imagine we have a sumstats that looks like this:
Here A2 is the effect allele and FRQ/Beta relates to it. If we run MSS, it will pick up on this:
But now imagine that the effect and FRQ actually related to the major, reference Allele (A1) instead:
Now if we run MSS we get:
Specifically note the message:
So MSS assumes the effect measurement and FRQ relate to A2 as how could it know it was A1? It warns the user that the frequency looks weird but leaves it up to the user to address this and rerun if necessary. However if we rename A1 and A2 as effect and non-effect, which MSS can more easily interpret since they aren't as ambiguous as A1/A2 look what happens:
Specifically, note:
So here,. it flipped the effect and FRQ so they relate to the minor allele (A2) to standardise it. Testing like this should give you an intuition of the checks MSS is doing and how it will work for your case and what you would need to do with your data. Alan. |
Closing due to inactivity, reopen if you want to discuss further |
I hope this message finds you well. I am currently working on a project involving GWAS summary statistics. Your
MungeSumstats
package has been incredibly useful for standardizing these datasets, and I appreciate the effort your team has put into developing it.I have encountered a specific challenge related to the conversion of minor allele frequencies (MAF) to effect allele frequencies (EAF) in my dataset. Given the diverse nature of the summary statistics files and the potential discrepancies in allele definitions, I am seeking your guidance on how best to handle this conversion within the framework of your package.
Could you please advise on the following:
Standard Approach: Does
MungeSumstats
provide any built-in functionality or recommended practices for converting MAF to EAF, particularly when the effect and minor alleles differ?Handling Allele Discrepancies: How does the package address discrepancies between the effect allele and the reference allele in standard GWAS formats? Are there specific functions or steps you would recommend for ensuring accurate EAF computation?
Integration with Other Packages: Is there any synergy with other packages, such as
gwasvcf
orieugwasr
, that can facilitate this conversion process while maintaining data integrity?User Customization: If manual adjustments are needed for unique cases, what is your advice on implementing these within
MungeSumstats
without compromising the package's standardization strengths?I appreciate any insights you can provide and look forward to your expert advice on these matters. Thank you for your time and for creating a tool that significantly enhances data analysis in genetic research.
Best regards,
Daxuan
The text was updated successfully, but these errors were encountered: