-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the final assembly function of the number of sequences ? #666
Comments
Hi Arthur, When you realign all reads on a single contig, this may recruit other reads that have some homology from other species (e.g. repeats or HGT events). This may result in contig breaks in non-meta mode (may also be in meta mode too). |
Dear Fenderglass, thanks for your answer. Maybe there is a way to find all recruited reads to a form a contig from the intermediate minimap files ? |
I think if you align reads against the entire assembly, rather than just cont contig, this should help. But there is no guarantee that you'll get identical assembly, because the algorithm is heuristic. |
Thank you very much for your answer. Correct me if I'm wrong, but there's no way to set a seed for flye to enhance reproducibility right ? Can we set system RNG seed maybe to enhance reproducibility regarding the heuristic step ? How well would you consider our blind reconstructions of unknown environmental MAGs this way ? Would you trust them ? |
Determinism of flye on identical output is discussed here: #640 |
Ok I understand, thank you. I saw the possible future add of a rng seed setting. Great idea. Thanks ! Arthur |
Hi there, I have a question.
Context
In the context of a master's degree practical course I'm giving, I wanted to let the students go through some metagenomics basics steps. So I am preparing a small dataset running fast.
Problem
I've used
flye --nano-hq seq.fastq --out-dir Assembly --threads 7 --min-overlap 1000 --iterations 4
to reconstruct my Nanopore acquired metagenome.It is a tropical soil metagenome, composed with vastly unknown genomes. So no reference available at all.
I play with the assembly graph with
Bandage
.Find MAGs that I like. Notably one nice circular.
I realign my extracted MAG against my whole fastq metagenome.
I take the aligned sequences, feed
flye
with them, aiming to reconstruct the nice looking circular MAG I obtained at first.You would think that building the MAG with only the sequences mapping on it would give the same reconstruction, or maybe, even something cleaner.
However, that's absolutely not what I obtain. I do not get a circular contig anymore, but a bunch of contigs.
I am highly puzzled by this situation.
It can mean several things :
I've discussed it with some bioinformaticians. I've been told that the proportion of the overlapping sequences on the whole dataset can influence the reconstruction choices and path in the assembly graph.
Please enlight me.
Arthur
The text was updated successfully, but these errors were encountered: