LJASplit

Split repeat resolution for long reads

Introduction

When assembling the genome, de Bruijn graphs are used. These graphs are built on the basis of reading the genome. However, some of the information from the original reads remains unused in the graph.

Due to inaccuracies of readings and imperfections of genome assembler, unresolved repeats occur in the de Bruijn graph.

Different assemblers try to resolve repeats using additional information from the reads.

Widely used SPAdes genome assembler uses a method based on iterative expansion of paths in a graph supported by reads.

Recently, another method for resolving repeats Multiplex de bruijn graph in the LJA assembler has been proposed.

Previously, SPAdes used a different method of resolving repeats, based on the idea of splitting vertices (split).

Split was quite inconvenient to work with paired reads of illumina. However with long Hi-Fi reads this method seems even more powerful than Multiplex de bruijn graph.

Objectives and purpose

Purpose

Implement the "Split" repeat resolution method and compare it with Multiplex de bruijn graph

Objectives

Learn different different methods for repeat resolution
Analyze LJA code
Get list of paths from LJA assembler
Write a prototype of the split method in Python

Approaches

Split one-to-many

If a vertex has one incoming edge and several outgoing edges, then it can be splitted into several, — one copy for each outgoing edge

Explicit split

If the reads passing through the incoming and outgoing edges from the vertex can be unambiguously splitted, then a vertex should be splitted with a copy for each such path through the vertex.

Results and futher plans

Implemented methods Split one-to-many and Explicit split.

Example of Split one-to-many:

Further development of the work may be the implementation of more advanced methods of vertex separation. After that, it will be possible to compare the results obtained with the Multiplex de bruijn graph method. If the results, as expected, turn out to be better, then this module can be rewritten from Python to C++ and implemented in LJA assembler.

Literature

Bankevich, A., Bzikadze, A.V., Kolmogorov, M. et al. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nat Biotechnol (2022).

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012 May;19(5):455-77.

Launch instructions

The project is launched with main.py, where .gfa and .als files — graph and reads, respectively should be specified.

graph.py, vertex.py, edge.py contain descriptions of the main methods.

make_short.py can be used for quick starts with incomplete sets of reads

analytics.py contains several useful for getting insights about data and debagging functions that build statistics. For example, the distribution of degrees of vertices:

files short.gfa and check_short_1.gfa illusrate split one-to-many approach. They can be visualised through Bandage

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
analytics.py		analytics.py
change_gfa.py		change_gfa.py
check_overlap.txt		check_overlap.txt
check_short_1.gfa		check_short_1.gfa
edge.py		edge.py
graph.py		graph.py
main.py		main.py
make_short_paths.py		make_short_paths.py
parse_paths.py		parse_paths.py
paths_short.als		paths_short.als
short.gfa		short.gfa
short_check.gfa		short_check.gfa
vertex.py		vertex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LJASplit

Introduction

Objectives and purpose

Approaches

Results and futher plans

Literature

Launch instructions

About

Releases

Packages

Languages

grgrbu/LJASplit

Folders and files

Latest commit

History

Repository files navigation

LJASplit

Introduction

Objectives and purpose

Approaches

Results and futher plans

Literature

Launch instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages