-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running circlator 0.14.0 to reproduce results from published article #54
Comments
Hi Sander, What version of spades are you using? The missing file error was a known issue - please see Re the plasmodium data, we used the option --assemble_spades_k 101 becuase the reads were low coverage. As for options, that probably depends on your dataset. We explored varying most of the options in the paper (see the section "Evaluation of user-defined parameters"). Best, |
Hi @martinghunt, Thanks for replying so fast. I've checked your test data from Plasmodium and managed to reproduce even with the newest version! I have one other question if i may ask. What is in your terms low coverage? 10 till 50 X? Thanks, Sander |
Hi Sander, Thanks for confirming that you reproduced the results. That's always good to hear :) I just had a quick look at a few of the NCTC samples compared with the plasmodium. I just made rough histograms of the read coverage from running samtools depth -aa. Typical NCTC example is: Bear in mind this is corrected read coverage (ie what Circlator takes), not uncorrected reads. Martin |
Thanks Martin! Sorry but i think i made a mistake in my comparison, i thought the 2 outputs of my run vs you test data was the same but i missed some. I run it with the --assemble_spades_k 101 . 06.fixstart.log from testDATA:
06.fixstart.log from My run:
I could only find one difference in the other logs and that was the clean.log 05.clean.log YOURS
05.clean.log MINE
What can be the cause of this? Thanks in advance! SPAdes version : 3.5.0 |
Apologies for the confusion - in actual fact Circlator 0.14.1 was used. I will need to send an erratum to get the manuscript corrected. Using 0.14.1 will mean that the 05.clean.logs will be identical. For the 06.fixstart.log, this part of circlator uses code from here: |
No problem, i installed everything according to your story, but i still do not manage to reproduce the results. These are the packages i now have installed.
Here you can download the Plasmodium data i use, just to be sure that i have the same as what you distribute. https://barmsijs.lumc.nl/szeeuw/Plasmodium.tar.gz In here all runs i tried are stored. The Yours 06.fixstart
My 06.fixstart
Again any help will be strongly appreciated. |
That pins down the remaining difference to fixstart, which is essentially bio_assembly_improvement. The contigs obviously have no match to a dnaa gene, so prodigal gets run to pick a gene near the centre of each contig and use that as a break point. I'm wondering if that part of the code is deterministic. @nds can you comment please? |
@martinghunt Which version of bio_assembly_improvement i should have installed, to get this to work? Or is this related to the version of Prodigal. Hope you can still provide me an answer on this. Thanks in advance! |
Hi, Sorry for the slow reply, took some digging to figure out what's going on. It's actually earlier on at the merge stage where things are different, not bio_assembly_refinement. Depending on code versions, unitig_97 doesn't always get circularized succesfully. It should be ~6kb long and then the prodigal gene is at 2906. When the circularization goes wrong, it ends up around 30kb long and then the prodigal gene is at 16227. I just ran circlator 0.14.1 with spades 3.5.0 on my original data, and on the data in your tarball and I get the same results as the paper. The important file here is the circularize log: $ cat 04.merge.circularise.log In a failed version, unitig_97 doesn't get circularized using the circular call from spades: The only differences I see python module in versions is: |
Dear Author,
I am trying to reproduce the results published in your article. After installing circlator, and all the other dependencies i get a error:
This is off course a very obvious error because the tools expect a file which contains a wrong name. This is a known bug in your 0.14.0 release?
Also i tried 1 of the newest releases 1.1.3. But here i cant reproduce your plasmodium results. It gives me that contig 96 is not circular.
See the folowing diff:
When i check if all dependencies are there it says all test succeeded. This is the point i'm stuck at.
Basically the question i have is:
How can i reproduce the findings of your published article: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0849-0
And which options are nice to take into account for circularizing genomes?
Thanks in advance!
Best regards,
Sander
The text was updated successfully, but these errors were encountered: