-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicycler stalls indefinitely while 'creating simple long read bridges' #256
Comments
I don't fully understand what's going on here, and I'll need to get my hands on a dataset which causes the issue to really investigate. I have, however, got a workaround in place. In the new version of Unicycler I'm working on now, there is an option, So even if I don't know what the root problem is, you can now use Ryan |
Hi, good to hear that you are working on a new version of Unicycler! Meanwhile we have published these results (https://doi.org/10.3390/foods10112637), and the datasets are publicly available under study accession number PRJEB44065. Details concerning which assemblies succeeded and failed are given in the supplementary data of the manuscript. So feel free to have a go at the data, and if you find out something I would be very interested to learn about it :) |
Great - I think I've figured it out. Unicycler was encountering what it thought might be a simple loop in the graph, but was actually a high-depth plasmid. This led it to trying way too many loop count possibilities in the simple long-read bridging step. It would have finished eventually, but it was wasting tons of time. I've put a simple fix in place: f4afc33. It might still be slow in cases like this, but it should be much better than before. Thanks for the help with the link to the dataset! Ryan |
I've been testing Unicycler extensively for hybrid assembly on a number of bacterial isolates (Bacillus). For 3 out of 10 samples I'm working on, Unicycler delivers a fairly complete, and, as it turns out, correct and accurate assembly, which confirms that this tool is very performant (and kudos for that!).
However, for the other 7 samples it stalls indefinitely at the step where long read alignments are used to resolve repeat structures (I've pasted an example of the last part of the unicycler report below). In that case, the process is killed (either by me or automatically after several hours) and Unicycler never finishes. This is especially peculiar since we know by now that the 10 isolates represent in essence the same strain. I've tried all kinds of fixes, but none of them really resolves this problem. I tried changing parameter settings in the unicycler command, of these only enforcing the use of a specific kmer length for spades assembly sometimes succeeds in making unicycler run to completion (however, the resulting assembly is bad). I also tried more extensive/severe filtering of the long read (nanopore) data set, but this does not help. I tried more rigorous filtering of the illumina reads, or combining short read sets of different samples, the latter of which sometimes solves the issue (but this is not a sustainable solution obviously). The isolates all carry an extrachromosomal recombinant plasmid, in which a gene is inserted that is also present on the genome. If I remove all reads from short and long read data that match the plasmid, Unicycler also runs to completion with the filtered data. A look at the spades assembly graph shows that spades initially assembles the plasmid sequence as part of the genome (which I know for sure to be incorrect). But this is the case for all the isolates, also the ones for which unicycler runs to completion, so this in itself can not explain why it fails on other samples.
Despite all this digging, I am still failing to grasp what is exactly the problem, and how we could fix it. Any thoughts and suggestions are highly appreciated. Or does somebody experience a similar problem?
`Example output of Unicycler:
Creating simple long read bridges (2021-02-26 21:58:09)
Aligning long reads to graph using minimap
Junction Option 1 Option 2 votes votes votes op. quality
87 -40 → 87 → -39, -40 → 87 → 17, 635 0 37 1 59.9
-39 → 87 → 17 -39 → 87 → -39
47 2 → 47 → 28, 44 2 → 47 → 44, 44 155 13 518 1 0.0
→ 47 → 44 → 47 → 28
93 -5 → 93 → -12, -5 → 93 → 23, 453 1 33 1 85.0
-3 → 93 → 23 -3 → 93 → -12
Start Repeat Middle End count Read votes count quality
-17 -87 39 40 100 1 loop: 96 votes 1 59.1
2 loops: 4 votes `
=> here it stalls indefinitely
The text was updated successfully, but these errors were encountered: