Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The scoring function of Unicycler for k-mer graphs may not work for SPAdes 3.13.2 or later #225

Closed
wanyuac opened this issue Jan 29, 2020 · 2 comments

Comments

@wanyuac
Copy link

wanyuac commented Jan 29, 2020

Hi Ryan,

Unicycler 0.4.8 failed scoring k-mer graphs when I did a hybrid assembly. The file unicycler.log says:

SPAdes assemblies (2020-01-24 11:02:57)
---------------------------------------
    Unicycler now uses SPAdes to assemble the short reads. It scores the assembly graph for each k-mer using the number of contigs (fewer is better) and the number of dead ends (fewer is better). The score function is 1/(c*(d+2)), where c is the contig count and d is the dead end count.

K-mer   Contigs   Dead ends   Score   
   21                           failed
   35                           failed
   47                           failed
   57                           failed
   67                           failed
   73                           failed
   81                           failed
   85                           failed
   91                           failed
   95       177          39   1.38e-04 ← best

Read depth filter: removed 2 contigs totalling 459 bp

Despite these failures, Unicycler finished the assembly job and produced assembly graphs. Nevertheless, I am concerning about this issue as it means unicycler cannot find out the k-mer graph that is actually the best. Here is a complete list of dependencies I have installed using bioconda:

  • SPAdes 3.14.0 (the latest release)
  • Python 3.7.3
  • Racon 1.4.10, Pilon 1.23
  • Bowtie 2.3.5, Clang 9.0.1, SAMtools 1.10, and BLAST+ 2.9.0

Unicycler was run using the command line:

unicycler --short1 illumina_1.fastq.gz --short2 illumina_2.fastq.gz --long nanopore.fastq.gz --mode normal --threads 8 --keep 3 --out output

Debugging

I looked into directories K21, ..., K91, and K95 of the SPAdes output and found no assembly_graph.fastg files in these directories except K95. The file structure of K21, ..., K91 was:

configs/
simplified_contigs/
final.lib_data

whereas in K95:

configs/
path_extend/
assembly_graph.fastg
assembly_graph_with_scaffolds.gfa
before_rr.fasta
final_contigs.fasta
final_contigs.paths
final.lib_data
scaffolds.fasta
scaffolds.paths

Since function spades_assembly in spades_func.py reads assembly_graph.fastg in each K-mer folder, I believe that the absence of k-mer graphs in K21, ..., K91 cause the failures aforementioned.

if just_last:
        graph_file = os.path.join(out_dir, 'K' + str(kmers[-1]), 'assembly_graph.fastg')
        return graph_file, insert_size_mean, insert_size_deviation
    else:
        graph_files = []
        for kmer in kmers:
            graph_file = os.path.join(out_dir, 'K' + str(kmer), 'assembly_graph.fastg')
            if os.path.isfile(graph_file):
                parent_dir = os.path.dirname(out_dir)
                copied_graph_file = os.path.join(parent_dir,
                                                 ('k%03d' % kmer) + '_assembly_graph.fastg')
                shutil.copyfile(graph_file, copied_graph_file)
                graph_files.append(copied_graph_file)
            else:
                graph_files.append(None)
        return graph_files, insert_size_mean, insert_size_deviation

To test my hypothesis, I ran SPAdes 3.14.0, 3.13.2, and 3.8.1 on the same set of paired-end reads (publicly available on NCBI SRA) using the command:

spades.py -1 ERR134515_1.fastq.gz -2 ERR134515_2.fastq.gz -o '3_14' --careful --only-assembler --threads 4 -k 21,35,47,67,73,81,85,95

Except SPAdes 3.8.1, the other two versions did not generate assembly_graph.fastg in k-mer graph folders except the one with the greatest k.

In conclusion, as far as I have tested, Unicycler may not work properly when SPAdes 3.13.2 or later is used, because these SPAdes versions do not produce assembly_graph.fastg files for every k value anymore. Could you help me to check whether this is a real issue or just my inappropriate system configuration? Thanks.

@wanyuac wanyuac changed the title The scoring function of Unicycler for k-mer graphs may not work for SPAdes 3.13 or later The scoring function of Unicycler for k-mer graphs may not work for SPAdes 3.13.2 or later Jan 29, 2020
@wanyuac
Copy link
Author

wanyuac commented Jan 30, 2020

This issue is the same as Issue #218 .

@rrwick
Copy link
Owner

rrwick commented Jan 22, 2022

As noted in #218, this is now fixed in the current version of Unicycler (v0.5.0). Thanks!

@rrwick rrwick closed this as completed Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants