Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQ-tree SNP error #124

Open
buchanri opened this issue Dec 3, 2024 · 2 comments
Open

IQ-tree SNP error #124

buchanri opened this issue Dec 3, 2024 · 2 comments

Comments

@buchanri
Copy link

buchanri commented Dec 3, 2024

Hello, I'm getting an error when running variant calling: IQ-tree_SNP step.

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP (Bradyrhizobium_ottawaense_GCF_002278135_3)'

Caused by:
  Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP (Bradyrhizobium_ottawaense_GCF_002278135_3)` terminated with an error exit status (2)


Command executed:

  # Get number of samples to decide whether or not to bootstrap
  FIRSTSAMPLE=$(ls -d1 input_data/* | head -n 1 || if [[ $? -eq 141 ]]; then true; else exit $?; fi)
  NSAMPLE=$(grep '>' $FIRSTSAMPLE | wc -l)
  if [ $NSAMPLE -gt 3 ]; then
      BOOT="-B 1000"
  else
      BOOT=""
  fi
 
  # Create phylogenetic tree
  iqtree2 \
       \
      $BOOT \
      --seqtype DNA -m GTR+ASC \
      -s input_data \
      -nt 12 \
      -ntmax 12 \
      -mem 72G \
 
  # Rename output by prefix
  mv input_data.treefile Bradyrhizobium_ottawaense_GCF_002278135_3.treefile
 
  # Save version information
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP":
      iqtree: $(echo $(iqtree -version 2>&1) | sed 's/^IQ-TREE multicore version //;s/ .*//')
  END_VERSIONS

Command exit status:
  2

Command output:
  IQ-TREE multicore version 2.1.4-beta COVID-edition for Linux 64-bit built Jun 24 2021
  Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
  Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.
 
  Host:    cerebro (AVX2, FMA3, 1006 GB RAM)
  Command: iqtree2 -B 1000 --seqtype DNA -m GTR+ASC -s input_data -nt 12 -ntmax 12 -mem 72G
  Seed:    763571 (Using SPRNG - Scalable Parallel Random Number Generator)
  Time:    Tue Dec  3 09:05:31 2024
  Kernel:  AVX+FMA - 12 threads (12 CPU cores detected)
 
  Reading 1 alignment files in directory input_data
  Reading alignment file input_data/Bradyrhizobium_ottawaense_GCF_002278135_3.fasta ... Fasta format detected

Command error:
  ERROR: Sequence GCF_002278135_3_5K3_2_S125 contains not enough characters (87753)
  ERROR: Sequence GCF_002278135_3_VJ5_1_S479 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ5_2_S480 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ5_3_S481 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ5_4_S482 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ6_1_S483 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ6_2_S484 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ6_3_S485 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ6_4_S486 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ7_1_S487 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ7_2_S488 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ7_3_S489 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ7_4_S490 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ8_1_S491 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ8_2_S492 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ8_3_S493 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ8_4_S494 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ9_1_S495 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ9_2_S496 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ9_3_S497 contains not enough characters (84853)
  ERROR: Sequence GCF_002278135_3_VJ9_4_S498 contains not enough characters (84853)
  ERROR:

Work dir:
  /nfs7/BPP/Chang_Lab/paradarc/paper2_bra/scripts/dev_branch/pathogensurveillance/work/d8/76d48103ec3570a70e257a183d0bba
@masudermann
Copy link
Contributor

I also had a similar error yesterday

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP (_no_group_defined__GCF_021462285_1)'

Caused by:
  Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP (_no_group_defined__GCF_021462285_1)` terminated with an error exit status (2)


Command executed:

  # Get number of samples to decide whether or not to bootstrap
  FIRSTSAMPLE=$(ls -d1 input_data/* | head -n 1 || if [[ $? -eq 141 ]]; then true; else exit $?; fi)
  NSAMPLE=$(grep '>' $FIRSTSAMPLE | wc -l)
  if [ $NSAMPLE -gt 3 ]; then
      BOOT="-B 1000"
  else
      BOOT=""
  fi
  
  # Create phylogenetic tree
  iqtree2 \
       \
      $BOOT \
      --seqtype DNA -m GTR+ASC \
      -s input_data \
      -nt 12 \
      -ntmax 12 \
      -mem 72G \
  
  # Rename output by prefix
  mv input_data.treefile _no_group_defined__GCF_021462285_1.treefile
  
  # Save version information
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP":
      iqtree: $(echo $(iqtree -version 2>&1) | sed 's/^IQ-TREE multicore version //;s/ .*//')
  END_VERSIONS

Command exit status:
  2

Command output:
  IQ-TREE multicore version 2.1.4-beta COVID-edition for Linux 64-bit built Jun 24 2021
  Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
  Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.
  
  Host:    61a9c91d0ac8 (AVX512, FMA3, 124 GB RAM)
  Command: iqtree2 -B 1000 --seqtype DNA -m GTR+ASC -s input_data -nt 12 -ntmax 12 -mem 72G
  Seed:    542201 (Using SPRNG - Scalable Parallel Random Number Generator)
  Time:    Mon Dec  2 08:22:08 2024
  Kernel:  AVX+FMA - 12 threads (32 CPU cores detected)
  
  Reading 1 alignment files in directory input_data
  Reading alignment file input_data/_no_group_defined__GCF_021462285_1.fasta ... Fasta format detected

Command error:
  IQ-TREE multicore version 2.1.4-beta COVID-edition for Linux 64-bit built Jun 24 2021
  Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
  Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.
  
  Host:    61a9c91d0ac8 (AVX512, FMA3, 124 GB RAM)
  Command: iqtree2 -B 1000 --seqtype DNA -m GTR+ASC -s input_data -nt 12 -ntmax 12 -mem 72G
  Seed:    542201 (Using SPRNG - Scalable Parallel Random Number Generator)
  Time:    Mon Dec  2 08:22:08 2024
  Kernel:  AVX+FMA - 12 threads (32 CPU cores detected)
  
  Reading 1 alignment files in directory input_data
  Reading alignment file input_data/_no_group_defined__GCF_021462285_1.fasta ... Fasta format detected
  ERROR: Sequence GCF_021462285_1_sample25 contains not enough characters (45000)
  ERROR: Sequence GCF_021462285_1_sample3 contains not enough characters (45153)
  ERROR: Sequence GCF_021462285_1_sample4 contains not enough characters (45153)
  ERROR: 

Work dir:
  /home/marthasudermann/pathogensurveillance/work/78/15c36762516ef714ac421d0b3a3134

Container:
  quay.io/biocontainers/iqtree:2.1.4_beta--hdcc8f71_0

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

@buchanri
Copy link
Author

buchanri commented Dec 3, 2024

I found something of note having to do with the sequence lengths:
In the step vcftab_to_snpaln, vcftab_to_snpaln_nodel.pl creates Bradyrhizobium_ottawaense_GCF_002278135_3_unfiltered.fasta. which has stats

file                                                                                                       format  type  num_seqs    sum_len  min_len  avg_len  max_len
../../../../paper2_bra/scripts/testing_iq_tree/Bradyrhizobium_ottawaense_GCF_002278135_3_unfiltered.fasta  FASTA   DNA         22  1,961,366   89,153   89,153   89,153

showing the sequences are the same length which is required for IQtree. But after that it does some filtering

# Remove samples with all missing data since IQtree complains
grep -n -E '^-+$' -B 1 Bradyrhizobium_ottawaense_GCF_002278135_3_unfiltered.fasta | sed -n 's/^\([0-9]\{1,\}\).*/\1d/p' | sed -f - Bradyrhizobium_ottawaense_GCF_002278135_3_unfiltered.fasta > Bradyrhizobium_ottawaense_GCF_002278135_3.fasta

which messes with the sequences lengths so that they are no longer the same. Possibly causing the problem.

file                                                                                            format  type  num_seqs    sum_len  min_len   avg_len  max_len
../../../../paper2_bra/scripts/testing_iq_tree/Bradyrhizobium_ottawaense_GCF_002278135_3.fasta  FASTA   DNA         22  1,873,966   84,853  85,180.3   89,153

/nfs7/BPP/Chang_Lab/paradarc/paper2_bra/scripts/dev_branch/pathogensurveillance/work/8b/fc4807e3e61dfef003370ba23a94be/.command.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants