Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem masking the consensus sequence #138

Closed
ktrns opened this issue Nov 17, 2020 · 7 comments
Closed

Problem masking the consensus sequence #138

ktrns opened this issue Nov 17, 2020 · 7 comments
Labels
bug Something isn't working
Milestone

Comments

@ktrns
Copy link

ktrns commented Nov 17, 2020

Dear all,

With our latest batch of data, I was running into several problems.

I am running this:

nextflow run nf-core/viralrecon --input $base/samples.csv --protocol 'amplicon' --amplicon_bed $base/primer/Eden_primers.bed --amplicon_fasta $base/primer/Eden_primers.fa --genome 'MN908947.3' --skip_assembly  --outdir $base/results -profile singularity -c dcgc.config -r 1.1.0

I am getting this log:

Caused by:
  Process `BCFTOOLS_CONSENSUS (Lib1)` terminated with an error exit status (2)

Command executed:

  cat MN908947_nCoV_2019_Wuhan.fa | bcftools consensus Lib1.vcf.gz > Lib1.consensus.fa
  
  bedtools genomecov \
      -bga \
      -ibam Lib1.trim.mkD.sorted.bam \
      -g MN908947_nCoV_2019_Wuhan.fa \
      | awk '$4 < 10' | bedtools merge > Lib1.mask.bed
  
  bedtools maskfasta \
      -fi Lib1.consensus.fa \
      -bed Lib1.mask.bed \
      -fo Lib1.consensus.masked.fa
  sed -i 's/MN908947_nCoV_2019_Wuhan/Lib1/g' Lib1.consensus.masked.fa
  header=$(head -n1 Lib1.consensus.masked.fa | sed 's/>//g')
  sed -i "s/${header}/Lib1/g" Lib1.consensus.masked.fa
  
  plot_base_density.r --fasta_files Lib1.consensus.masked.fa --prefixes Lib1 --output_dir ./

Going into the respective work directory causing the issue, I can find this:

  1. The link Lib1.vcf.gz is broken. The directory it is linking to, contains Lib1.vcf, but not Lib1.vcf.gz.
  2. .command.err shows:
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::replace: __pos (which is 29902) > this->size() (which is 29899)
<ourpath>/nfcore/work/73/cf65520a401d66d99182767d1315e4/.command.sh: line 13:  2759 Aborted                 (core dumped) bedtools maskfasta -fi Lib1.consensus.fa -bed Lib1.mask.bed -fo Lib1.consensus.masked.fa
SIGABRT: abort

2a. cat Lib1.mask.bed
MN908947.3 0 39
MN908947.3 4442 6299
MN908947.3 15230 18918
MN908947.3 29847 29869
MN908947.3 29902 29903

2b. cat Lib1.consensus.fa.fai
MN908947.3 29899 12 60 61

After digging a bit, I see that the problem is this. The first command might generate a consensus that is different in length to the original genome due to indels (this has happened in our case). The second command tries to mask the consensus based on coordinates that come from the original genome. Now that the length of the consensus is shorter, there is a clash.

Would you be able to fix this one?

Many thanks in advance
Katrin

@ktrns ktrns added the bug Something isn't working label Nov 17, 2020
@ktrns
Copy link
Author

ktrns commented Nov 17, 2020

Running viralrecon with --callers varscan2,ivar we get the following log:

Caused by:
  Process `VARSCAN2_CONSENSUS (Lib1)` terminated with an error exit status (2)

Command executed:

  cat MN908947_nCoV_2019_Wuhan.fa | bcftools consensus Lib1.AF0.75.vcf.gz > Lib1.AF0.75.consensus.fa
  
  bedtools genomecov \
      -bga \
      -ibam Lib1.trim.mkD.sorted.bam \
      -g MN908947_nCoV_2019_Wuhan.fa \
      | awk '$4 < 10' | bedtools merge > Lib1.AF0.75.mask.bed
  
  bedtools maskfasta \
      -fi Lib1.AF0.75.consensus.fa \
      -bed Lib1.AF0.75.mask.bed \
      -fo Lib1.AF0.75.consensus.masked.fa
  header=$(head -n 1 Lib1.AF0.75.consensus.masked.fa | sed 's/>//g')
  sed -i "s/${header}/Lib1/g" Lib1.AF0.75.consensus.masked.fa
  
  plot_base_density.r --fasta_files Lib1.AF0.75.consensus.masked.fa --prefixes Lib1.AF0.75 --output_dir ./

So it looks very similar.

@ktrns
Copy link
Author

ktrns commented Nov 17, 2020

It looks like the pipeline requires a liftOver at different places.

You can run bcftools consensus with --chain to write a chain file for liftOver, and then run liftOver for the *.mask.bed file.

@drpatelh
Copy link
Member

@saramonzon @svarona have you come across anything like this? What's the best fix?

@drpatelh
Copy link
Member

This should have been fix in #146. Be great if you can test the code on dev and close this issue if that's the case @ktrns 🙂 Apologies for the delay in getting around to merging.

@drpatelh
Copy link
Member

I assume this is now fixed @saramonzon?

@drpatelh
Copy link
Member

Ping @saramonzon

@saramonzon
Copy link
Contributor

Yes, this is the fix included in dev, for bcftools consensus as varscan has been dropped!

@drpatelh drpatelh added this to the 2.0 milestone Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants