You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> > No observable difference with standard GTC input.
I don't observe any difference in the QC_Report.docx. But we do see differences in the corresponding Plink files created (BED/BIM/FAM). For example:
$ diff samples.bim ../../020a/sample_level/samples.bim | head
5c5
< 1 rs3131972 0 817341 G A
---
> 1 rs3131972 0 817341 A G
We see the alleles flipped. Can you address this?
This seems a flip on the gtc-to-vcf workflow when it imports to plink rather than on bcf entry point with bcf to plink. In the bcf entry point I have protected the allele order with --keep-allele-order switch so allele order would always be preserved from bcf to plink conversion. The bcf/vcf encodes alleles in the REF/ALT space but plink bim file encodes in the major/minor space. If we want to preserve the VCF allele order and go with the typical notion that REF allele is major allele. So,
vcf/bcf record (CHROM ID REF ALT): chr1 rs3131972 A G
should translated into bim record (CHROM ID CM POS ALLELE1(minor) ALLELE2 (major)): 1 rs3131972 0 817341 G A
Another simpler and emperical way to verify this is if you re-export samples_level/samples.bim to vcf, you would see the orignal allele order as in vcf.
You can reexport plink bed/bim/fam to vcf.gz with the following: plink --bfile bcf_out/samples --recode vcf-iid bgz --out bcf_out/samples --allow-extra-chr --keep-allele-order --const-fid 0
And compare the plink rexported with the orignal input like this: diff <(bcftools query -f"%ID %REF %ALT" /scratch/marksja/samples.bcf |sort) <(bcftools query -f"%ID %REF %ALT" bcf_out/samples.vcf.gz |sort)
You should expect to see no difference except for multi-allellics not imported which is not supported by plink/1.9 and would need to be fixed upstream in preparation of bcf.
Overall, I can open a separate issue and we can adress this more holistically throughout the pipeline in a different PR. However, for the BCF entry point, I see the transformation is as expected with no flip.
Feel free to let me know if this seems acceptable.
Verify that gtc-to-vcf also includes the --keep-allele-order switch.
Also may be worth bring up in meeting with all users to ensure which allele order would they prefer to have in the output. REF/ALT manner or MAJOR/MINOR. If --keep-allele-order is not added at the import, plink will switch alleles depending on the dataset cohort.
This seems a flip on the gtc-to-vcf workflow when it imports to plink rather than on bcf entry point with bcf to plink. In the bcf entry point I have protected the allele order with
--keep-allele-order
switch so allele order would always be preserved from bcf to plink conversion. The bcf/vcf encodes alleles in the REF/ALT space but plink bim file encodes in the major/minor space. If we want to preserve the VCF allele order and go with the typical notion that REF allele is major allele. So,vcf/bcf record (CHROM ID REF ALT):
chr1 rs3131972 A G
should translated into bim record (CHROM ID CM POS ALLELE1(minor) ALLELE2 (major)):
1 rs3131972 0 817341 G A
bim format is described here: https://www.cog-genomics.org/plink/1.9/formats#bim
Another simpler and emperical way to verify this is if you re-export samples_level/samples.bim to vcf, you would see the orignal allele order as in vcf.
You can reexport plink bed/bim/fam to vcf.gz with the following:
plink --bfile bcf_out/samples --recode vcf-iid bgz --out bcf_out/samples --allow-extra-chr --keep-allele-order --const-fid 0
And compare the plink rexported with the orignal input like this:
diff <(bcftools query -f"%ID %REF %ALT" /scratch/marksja/samples.bcf |sort) <(bcftools query -f"%ID %REF %ALT" bcf_out/samples.vcf.gz |sort)
You should expect to see no difference except for multi-allellics not imported which is not supported by plink/1.9 and would need to be fixed upstream in preparation of bcf.
Overall, I can open a separate issue and we can adress this more holistically throughout the pipeline in a different PR. However, for the BCF entry point, I see the transformation is as expected with no flip.
Feel free to let me know if this seems acceptable.
Originally posted by @rajwanir in #314 (comment)
The text was updated successfully, but these errors were encountered: