Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ivar_variants_to_vcf.py does not sort variant #335

Closed
Rohit-Satyam opened this issue Oct 26, 2022 · 4 comments
Closed

The ivar_variants_to_vcf.py does not sort variant #335

Rohit-Satyam opened this issue Oct 26, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Oct 26, 2022

Description of the bug

When trying to index the ivar vcf using tabix,, I get following error:

[E::hts_idx_push] Unsorted positions on sequence #1: 6512 followed by 5711
tbx_index_build failed: temp.ivar.vcf.gz

In VCF I can see

MN908947.3	6512	.	ANNN	A	.	PASS	DP=155	GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ	1:154:90:35:151:0:20:0.974194
MN908947.3	5711	.	C	A	.	ft	DP=64	GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ	1:62:0:36:2:0:37:0.03125

Command used and terminal output

I was testing the new ivar_variants_to_vcf.py in the development branch

python ~/Documents/COVID_Project/nextCov/modules/ivar_variants_to_vcf.py temp.ivar.tsv temp.ivar.vcf -is -f ../00_index/Sars_cov_2.ASM985889v3.dna.toplevel.fa

and Ivar was run as

samtools mpileup     -aa     --count-orphans     --max-depth 0     --redo-BAQ -x     --min-BQ 20     --min-MQ 20     16_S16_L001.dedup.bam  | ivar variants     -p temp.ivar     -q 20     -t 0.03     -r  ../00_index/Sars_cov_2.ASM985889v3.dna.toplevel.fa     -g ../../../resources/Sars_cov_2.ASM985889v3.101.gff3 

Relevant files

temp.ivar.zip

@Rohit-Satyam Rohit-Satyam added the bug Something isn't working label Oct 26, 2022
@Rohit-Satyam
Copy link
Author

Rohit-Satyam commented Oct 27, 2022

@mattheww95 tagging issue 326 and issue 321

Okay, I thought that it might be a sample problem and that --reference wasn't provided while running samtools mpileup. So I changed sample and used --reference but I still get same error

ivar_variants_to_vcf.py vcf file

MN908947.3      4565    .       C       A       .       ft      DP=27   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:26:26:34:1:1:37:0.037037
MN908947.3      5545    .       AACCC   A       .       ft      DP=22   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:22:21:35:1:0:20:0.0454545
MN908947.3      5551    .       TAAGG   T       .       ft      DP=21   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:21:21:36:1:0:20:0.047619
MN908947.3      4579    .       T       A       .       ft      DP=27   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:25:25:34:2:2:31:0.0740741
MN908947.3      4597    .       T       A       .       ft      DP=33   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:32:32:35:1:1:25:0.030303
MN908947.3      5565    .       C       A       .       ft      DP=27   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:26:24:36:1:1:37:0.037037
MN908947.3      5675    .       T       G       .       ft      DP=33   GT:REF_DP:REF_RV:REF_QUAL:ALT_DP:ALT_RV:ALT_QUAL:ALT_FREQ       1:32:0:35:1:0:37:0.030303

For some reason the vcf file produced by your code is not sorted

tabix 
[E::hts_idx_push] Unsorted positions on sequence #1: 5551 followed by 4579

EDIT 1
Temporary fix is running bcftools sort on the output vcf file.

bcftools sort -O v --output-file sorted_S24_L001.raw.ivar.vcf S24_L001.raw.ivar.vcf

@mattheww95
Copy link
Contributor

This appears to be similar to the issue I had but slightly different.

The bug seems to come from when codons are being merged, but while looking through the code I am nervous to simply handle the StopIteration error directly in case an incorrect result is given.

I was able to successfully run your file when not merging codons (adding the "-ic" flag) which is typically how I run the pipeline. Can you confirm that running your sample with the "-ic" flag solves the issue @Rohit-Satyam.

@saramonzon
Copy link
Contributor

Hi @mattheww95 and @Rohit-Satyam !
ivar_to_variants.vcf script does not sort the variants, it is true, this is done in the following process in viralrecon pipeline:

BCFTOOLS_SORT (
IVAR_VARIANTS_TO_VCF.out.vcf
)
ch_versions = ch_versions.mix(BCFTOOLS_SORT.out.versions.first())

And this only happens when merge codons functionality is performed as indels are outputted unordered, with the --ignore-merge-codons as the variants are already ordered in the ivar tsv, and the order is unaltered.
We added the reference tag to the vcf header so it can be easily sorted after the script without including much complexity in the code.
Something like this should work:

bcftools \
        sort \
        --output sample_sorted.vcf \
        sample.vcf

Please tell me if this fixes your issue!

@Rohit-Satyam
Copy link
Author

@saramonzon Thanks for confirming this and sharing your insights. I will use bcftools sort then. I wanted to make sure if it was a bug in ivar_to_variants.vcf or not!! I will close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants