Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in running MungeSumstats::format_sumstats with bi_allelic_filter = FALSE #137

Closed
rootze opened this issue Dec 13, 2022 · 2 comments

Comments

@rootze
Copy link

rootze commented Dec 13, 2022

Thanks for developing this awesome tool. I have a question regarding dropping a large number of SNPs in GWAS summary statistics. I used the AD GWAS (PubMed ID: 35379992)
https://www.ebi.ac.uk/gwas/publications/35379992

Version of MungeSumstats -- MungeSumstats_1.5.18

AD_NG2022_GWAS_Munged <- MungeSumstats::format_sumstats(
                                                        path = AD_NG2022_GWAS,  
                                                        ref_genome="GRCh38", 
                                                        save_path = paste0(AD_NG2022_PATH, 'formatted/', 'AD_NG2022_GWAS_Munged_12122022.txt'), 
                                                        bi_allelic_filter = FALSE)

When running with bi_allelic_filter = FALSE as something I can try after reading: Issue Large number of non-biallelic SNPs #111, I get an error as follow:

Error in check_allele_flip(sumstats_dt = sumstats_return$sumstats_dt,  :
  Certain SNPs need to be flipped along with their effect columns and frequency column. However, to flip the
FRQ column, only bi-allelic SNPs can be considered. It is recommended to set bi_allelic_filter to TRUE so
non-bi-allelic SNPs are removed. Otherwise, set allele_flip_frq to FALSE to not flip the FRQ column but note
this could lead to incorrect FRQ values.

With the above error suggested setting allele_flip_frq to FALSE. (with bi_allelic_filter = FALSE, allele_flip_drop = FALSE,)

AD_NG2022_GWAS_Munged <- MungeSumstats::format_sumstats(path = AD_NG2022_GWAS, ref_genome="GRCh38", save_path = paste0(AD_NG2022_PATH, 'formatted/', 'AD_NG2022_GWAS_Munged_12122022.txt'), bi_allelic_filter = FALSE, allele_flip_drop = FALSE, allele_flip_frq = FALSE)
Summary statistics report:
   - 19,779,968 rows (93.7% of original 21,101,114 rows)
   - 19,741,483 unique variants
   - 5,418 genome-wide significant variants (P<5e-8)
   - 22 chromosomes
Done munging in 16.921 minutes.
Successfully finished preparing sumstats file, preview:
Reading header.:

Although 93.7% of original SNPs were kept, the FRQ is not all MAF, as I understand anymore. I thought MAF should be minor allele frequency. But the following message suggested there are a lot of FRQ with values > 0.5. I am a bit confused right now and what I should use.

Warning: When method is an integer, must be >0.
8,476,457 SNPs (42.9%) have FRQ values > 0.5. Conventionally the FRQ column is intended to show the minor/effect allele frequency.

Just giving it a try, when I set the bi_allelic_filter = TRUE and I also set allele_flip_drop = FALSE
About 40% of the SNPs were dropped, including some lead SNPs.

Summary statistics report:
    - 13,051,151 rows (61.9% of original 21,101,114 rows)
    - 13,051,151 unique variants
    - 2,975 genome-wide significant variants (P<5e-8)
    - 22 chromosomes
 Done munging in 27.315 minutes.
 Successfully finished preparing sumstats file, preview:

I am in a dilemma. And would be much appreciated your suggestion. Thank you!

@rootze rootze changed the title Problem in running MungeSumstats::format_sumstats without losing a large number of SNPs Problem in running MungeSumstats::format_sumstats with bi_allelic_filter = FALSE Dec 13, 2022
@Al-Murphy
Copy link
Owner

Al-Murphy commented Dec 13, 2022

Hey!

Great to hear you are finding the package useful!

Firstly, I would highly recommend installing R 4.2 and Bioconductor 3.16 to install MSS >=v1.6.0. This version has far more functionality than 1.5 that you are using - including a choice of dbSNP versions so you can use the latest version (155) as well as 144.

On your specific problem - the issue you are seeing with check_allele_flip is as expected, we can't flip the effect of a SNP if the direction is incorrect if there is another alternative allele i.e. 1-effect would not be correct in this instance. You correctly state to use allele_flip_frq to avoid this error which will stop this check and leave flip but like you say then the MAF value may not be correct. Another option would be to use allele_flip_drop which will drop any SNPs that are non-bi-allelic and that need to be flipped. I would suggest this approach if you need to use the MAF values downstream.

A broader question is to whether you want to remove non-bi-allelic SNPs all together. This is discussed more here but in short, general advice for downstream analysis is to remove them. Whether the large drop in remaining SNPs will make a difference in downstream analysis is something we are currently testing.

Hope this helps,
Alan.

@rootze
Copy link
Author

rootze commented Dec 14, 2022

@Al-Murphy Thank you for your help. That made sense.

@rootze rootze closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants