Flye assembler

Version: 2.8.3

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PacBio / ONT reads as input and outputs polished contigs. Flye also has a special mode for metagenome assembly.

Manuals

Latest updates

Flye 2.8.3 release (10 Feb 2021)

Reduced RAM consumption for some ultra-long ONT datasets
Fixed rare artifical sequence insertions on some ONT datasets
Asseemblies should be largely identical to 2.8

Flye 2.8.2 release (12 Dec 2020)

Improvements in GFA output, much faster generation of large and tangled graphs
Speed improvements for graph simplification algorithms
A few minor bugs fixed
Assemblies should be largely identical to 2.8

Flye 2.8.1 release (02 Sep 2020)

Added a new option --hifi-error to control the expected error rate of HiFi reads (no other changes)

Flye 2.8 release (04 Aug 2020)

Improvements in contiguity and speed for PacBio HiFi mode
Using the --meta k-mer selection strategy in isolate assemblies as well. This strategy is more robust to drops in coverage/contamination and reqires less memory
1.5-2x RAM footprint reduction for large assemblies (e.g. human ONT assembly now uses 400-500 Gb)
Genome size parameter is no longer required (it is still needed for downsampling though --asm-coverage)
Flye now can occasionally use overlaps shorter than "minOverlap" parameter to close disjointig gaps
Various improvements and bugfixes

Flye 2.7.1. release (24 Apr 2020)

Fixes very long GFA generation time for some large assemblies (no other changes)

Flye 2.7 release (03 Mar 2020)

Better assemblies of real (and comlpex) metagenomes
New option to retain alternative haplotypes, rather than collapsing them (--keep-haplotypes)
PacBio HiFi mode
Using Bam instead of Sam to reduce storage requirements and IO load
Improved human assemblies
Annotation of alternative contigs
Better polishing quality for the newest ONT datasets
Trestle module is disabled by default (use --trestle to enable)
Many big fixes and improvements

Flye 2.6 release (19 Sep 2019)

This release introduces Python 3 support (no other changes)

Flye 2.5 release (25 Jul 2019)

Better ONT polishing for the latest basecallers (Guppy/flipflop)
Improved consensus quality of repetitive regions
More contiguous assemblies of real metagenomes
Improvements for human genome assemblies
Various bugfixes and performance optimizations
Also check the new FAQ section

Repeat graph

Flye is using repeat graph as a core data structure. In difference to de Bruijn graphs (which require exact k-mer matches), repeat graphs are built using approximate sequence matches, and can tolerate higher noise of SMS reads.

The edges of repeat graph represent genomic sequence, and nodes define the junctions. Each edges is classified into unique or repetitive. The genome traverses the graph (in an unknown way), so as each unique edge appears exactly once in this traversal. Repeat graphs reveal the repeat structure of the genome, which helps to reconstruct an optimal assembly.

Above is an example of the repeat graph of a bacterial assembly. Each edge is labeled with its id, length and coverage. Repetitive edges are shown in color, and unique edges are black. Note that each edge is represented in two copies: forward and reverse complement (marked with +/- signs), therefore the entire genome is represented in two copies. This is necessary because the orientation of input reads is unknown.

In this example, there are two unresolved repeats: (i) a red repeat of multiplicity two and length 35k and (ii) a green repeat cluster of multiplicity three and length 34k - 36k. As the repeats remained unresolved, there are no reads in the dataset that cover those repeats in full. Five unique edges will correspond to five contigs in the final assembly.

Repeat graphs produced by Flye could be visualized using AGB or Bandage.

Flye benchmarks

Genome	Data	Asm.Size	NG50	CPU time	RAM
E.coli	PB 50x	4.6 Mb	4.6 Mb	2 h	2 Gb
C.elegans	PB 40x	106 Mb	4.3 Mb	100 h	31 Gb
A.thaliana	PB 75x	119 Mb	11.9 Mb	100 h	59 Gb
D.melanogaster	ONT 30x	136 Mb	19.9 Mb	130 h	33 Gb
D.melanogaster	PB 120x	141 Mb	18.8 Mb	150 h	70 Gb
Human NA12878	ONT 35x (rel6)	2.8 Gb	37.9 Mb	3100 h	394 Gb
Human CHM13 ONT	ONT 120x (rel5)	2.9 Gb	69.4 Mb	4000 h	450 Gb
Human CHM13 HiFi	PB HiFi 30x	3.0 Gb	39.8 Mb	780 h	141 Gb
Human HG002	PB HiFi 30x	3.0 Gb	33.5 Mb	630 h	138 Gb
Human CHM1	PB 100x	2.8 Gb	18.3 Mb	2700 h	444 Gb
HMP mock	PB meta 7 Gb	68 Mb	2.6 Mb	60 h	72 Gb
Zymo Even	ONT meta 14 Gb	65 Mb	0.7 Mb	60 h	129 Gb
Zymo Log	ONT meta 16 Gb	29 Mb	0.2 Mb	100 h	76 Gb

The assemblies generated using Flye 2.8 could be downloaded from Zenodo. All datasets were run with default parameters for the corresponding read type with the following exceptions: CHM13 T2T was run with --min-overlap 10000 --asm-coverage 50; CHM1 was run with --asm-coverage 50. CHM13 HiFi and HG002 HiFi datasets were run in --pacbio-hifi mode and --hifi-error 0.003.

Third-party

Flye package includes some third-party software:

License

Flye is distributed under a BSD license. See the LICENSE file for details.

Credits

Flye is developed in Pavel Pevzner's lab at UCSD

Main code contributors:

metaFlye: Mikhail Kolmogorov
Repeat graph and current package maintaining: Mikhail Kolmogorov
Trestle module and original polisher code: Jeffrey Yuan
Original contig extension code: Yu Lin
Short plasmids recovery module: Evgeny Polevikov

Publications

Mikhail Kolmogorov, Derek M. Bickhart, Bahar Behsaz, Alexey Gurevich, Mikhail Rayko, Sung Bong Shin, Kristen Kuhn, Jeffrey Yuan, Evgeny Polevikov, Timothy P. L. Smith and Pavel A. Pevzner "metaFlye: scalable long-read metagenome assembly using repeat graphs", Nature Methods, 2020 doi:s41592-020-00971-x

Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8

Yu Lin, Jeffrey Yuan, Mikhail Kolmogorov, Max W Shen, Mark Chaisson and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using de Bruijn Graphs", PNAS, 2016 doi:10.1073/pnas.1604560113

How to cite: the 2020 paper is the most relevant to metagenome assembly. For single genome assembly, use the 2019 paper as reference. The 2016 paper describes solid k-mer indexing and polishing approaches that are used as core algorithms in the current pipeline.

How to get help

A preferred way report any problems or ask questions about Flye is the issue tracker. Before posting an issue/question, consider to look through the FAQ and existing issues (opened and closed) - it is possble that your question has already been answered.

If you reporting a problem, please include the flye.log file and provide details about your dataset.

In case you prefer personal communication, please contact Mikhail at fenderglass@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 1,769 Commits
bin		bin
docs		docs
flye		flye
lib		lib
src		src
.gitignore		.gitignore
.ycm_extra_conf.py		.ycm_extra_conf.py
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flye assembler

Version: 2.8.3

Manuals

Latest updates

Flye 2.8.3 release (10 Feb 2021)

Flye 2.8.2 release (12 Dec 2020)

Flye 2.8.1 release (02 Sep 2020)

Flye 2.8 release (04 Aug 2020)

Flye 2.7.1. release (24 Apr 2020)

Flye 2.7 release (03 Mar 2020)

Flye 2.6 release (19 Sep 2019)

Flye 2.5 release (25 Jul 2019)

Repeat graph

Flye benchmarks

Third-party

License

Credits

Publications

How to get help

About

Releases

Packages

Languages

License

BiKC/Flye

Folders and files

Latest commit

History

Repository files navigation

Flye assembler

Version: 2.8.3

Manuals

Latest updates

Flye 2.8.3 release (10 Feb 2021)

Flye 2.8.2 release (12 Dec 2020)

Flye 2.8.1 release (02 Sep 2020)

Flye 2.8 release (04 Aug 2020)

Flye 2.7.1. release (24 Apr 2020)

Flye 2.7 release (03 Mar 2020)

Flye 2.6 release (19 Sep 2019)

Flye 2.5 release (25 Jul 2019)

Repeat graph

Flye benchmarks

Third-party

License

Credits

Publications

How to get help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages