Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running circlator 0.14.0 to reproduce results from published article #54

Closed
sajvanderzeeuw opened this issue Apr 12, 2016 · 9 comments

Comments

@sajvanderzeeuw
Copy link

Dear Author,

I am trying to reproduce the results published in your article. After installing circlator, and all the other dependencies i get a error:

 /exports/sasc/sajvanderzeeuw/circlator-0.14.0/build/scripts-3.4/circlator all hgap_assembly.fasta corrected_reads.fasta circlator_0.14                                                                                                              Traceback (most recent call last):
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/pyfastaq-3.11.1-py3.4.egg/pyfastaq/utils.py", line 21, in open_file_read
FileNotFoundError: [Errno 2] No such file or directory: '03.assemble/contigs.fastg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/exports/sasc/sajvanderzeeuw/circlator-0.14.0/build/scripts-3.4/circlator", line 52, in <module>
    exec('circlator.tasks.' + task + '.run()')
  File "<string>", line 1, in <module>
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/circlator-0.14.0-py3.4.egg/circlator/tasks/all.py", line 138, in run
    m.run()
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/circlator-0.14.0-py3.4.egg/circlator/merge.py", line 681, in run
    self._circularise_contigs(nucmer_hits)
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/circlator-0.14.0-py3.4.egg/circlator/merge.py", line 318, in _circularise_contigs
    called_as_circular_by_spades = self._get_spades_circular_nodes(reassembly_fastg)
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/circlator-0.14.0-py3.4.egg/circlator/merge.py", line 609, in _get_spades_circular_nodes
    names = set([x.id.rstrip(';') for x in seq_reader if ':' in x.id])
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/circlator-0.14.0-py3.4.egg/circlator/merge.py", line 609, in <listcomp>
    names = set([x.id.rstrip(';') for x in seq_reader if ':' in x.id])
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/pyfastaq-3.11.1-py3.4.egg/pyfastaq/sequences.py", line 35, in file_reader
  File "/home/sajvanderzeeuw/.virtualenvs/circlator/lib/python3.4/site-packages/pyfastaq-3.11.1-py3.4.egg/pyfastaq/utils.py", line 23, in open_file_read
pyfastaq.utils.Error: Error opening for reading file '03.assemble/contigs.fastg'

This is off course a very obvious error because the tools expect a file which contains a wrong name. This is a known bug in your 0.14.0 release?

Also i tried 1 of the newest releases 1.1.3. But here i cant reproduce your plasmodium results. It gives me that contig 96 is not circular.

See the folowing diff:

diff 04.merge.circularise.log ../test_circular_with_dnaAfile/04.merge.circularise.log
1a2
> [merge circularised]  unitig_97|quiver        0       0       0       0
3d3
< [merge circularised]  unitig_97|quiver        0       0       1       1

When i check if all dependencies are there it says all test succeeded. This is the point i'm stuck at.

Basically the question i have is:

How can i reproduce the findings of your published article: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0849-0

And which options are nice to take into account for circularizing genomes?

Thanks in advance!

Best regards,

Sander

@martinghunt
Copy link
Collaborator

Hi Sander,

What version of spades are you using? The missing file error was a known issue - please see
#39
If you want to use Circlator version 0.14.0, then you'll need SPAdes version 3.6.0 or lower. For the paper we used 3.5.0.

Re the plasmodium data, we used the option --assemble_spades_k 101 becuase the reads were low coverage.

As for options, that probably depends on your dataset. We explored varying most of the options in the paper (see the section "Evaluation of user-defined parameters").

Best,
Martin

@sajvanderzeeuw
Copy link
Author

Hi @martinghunt,

Thanks for replying so fast. I've checked your test data from Plasmodium and managed to reproduce even with the newest version!

I have one other question if i may ask. What is in your terms low coverage? 10 till 50 X?

Thanks,
Best,

Sander

@martinghunt
Copy link
Collaborator

Hi Sander,

Thanks for confirming that you reproduced the results. That's always good to hear :)

I just had a quick look at a few of the NCTC samples compared with the plasmodium. I just made rough histograms of the read coverage from running samtools depth -aa. Typical NCTC example is:

screen shot 2016-04-15 at 09 38 12

Bear in mind this is corrected read coverage (ie what Circlator takes), not uncorrected reads.

Martin

@sajvanderzeeuw
Copy link
Author

Thanks Martin!

Sorry but i think i made a mistake in my comparison, i thought the 2 outputs of my run vs you test data was the same but i missed some. I run it with the --assemble_spades_k 101 .
Circlator all hgap_assembly.fasta corrected_reads.fasta circlator_k_101

06.fixstart.log from testDATA:

[contig break finder] id        break_point     gene_name       gene_reversed   new_name        skipped
[contig break finder] unitig_96|quiver  16443   prodigal        no      -       -
[contig break finder] unitig_97|quiver  2906    prodigal        yes     -       -

06.fixstart.log from My run:

[contig break finder]   id      break_point     gene_name       gene_reversed   new_name        skipped
[contig break finder]   unitig_96|quiver        16443   prodigal        no      -       -
[contig break finder]   unitig_97|quiver        2906    prodigal        no      -       -

I could only find one difference in the other logs and that was the clean.log

05.clean.log YOURS

[clean] unitig_96|quiver        user_kept
[clean] unitig_97|quiver        user_kept

05.clean.log MINE

[contig cleanup] contig_id      clean_log
[contig cleanup] unitig_96|quiver skipped
[contig cleanup] unitig_97|quiver skipped

What can be the cause of this? Thanks in advance!

SPAdes version : 3.5.0
Mummer: 3.23
Circlator: 0.14.0
Prodigal: 2.60
BWA: 0.7.12
Samtools: 0.1.9

@martinghunt
Copy link
Collaborator

Apologies for the confusion - in actual fact Circlator 0.14.1 was used. I will need to send an erratum to get the manuscript corrected.

Using 0.14.1 will mean that the 05.clean.logs will be identical.

For the 06.fixstart.log, this part of circlator uses code from here:
https://github.com/sanger-pathogens/bio_assembly_refinement
In the paper we used 0.3.2. You've probably got 0.5.0 installed.

@sajvanderzeeuw
Copy link
Author

No problem, i installed everything according to your story, but i still do not manage to reproduce the results.

These are the packages i now have installed.

pip freeze
bio-assembly-refinement==0.3.2
circlator==0.14.1
Cython==0.24
et-xmlfile==1.0.1
jdcal==1.2
nose==1.3.7
numpy==1.11.0
openpyxl==2.4.0a1
pbr==1.9.1
pyfastaq==3.5.0
pymummer==0.4.0
pysam==0.8.1
six==1.10.0
stevedore==1.12.0
virtualenv==15.0.1
virtualenv-clone==0.2.6
virtualenvwrapper==4.7.1

Here you can download the Plasmodium data i use, just to be sure that i have the same as what you distribute.

https://barmsijs.lumc.nl/szeeuw/Plasmodium.tar.gz

In here all runs i tried are stored.

The circlator dir is your run
circlator_0.14.1.fixed is the latest run i did.

Yours 06.fixstart

[contig break finder] id    break_point gene_name   gene_reversed   new_name    skipped
[contig break finder] unitig_96|quiver  16443   prodigal    no  -   -
[contig break finder] unitig_97|quiver  2906    prodigal    yes -   -

My 06.fixstart

[contig break finder] id    break_point gene_name   gene_reversed   new_name    skipped
[contig break finder] unitig_96|quiver  16443   prodigal    no  -   -
[contig break finder] unitig_97|quiver  16227   prodigal    no  -   -

Again any help will be strongly appreciated.

@martinghunt
Copy link
Collaborator

That pins down the remaining difference to fixstart, which is essentially bio_assembly_improvement. The contigs obviously have no match to a dnaa gene, so prodigal gets run to pick a gene near the centre of each contig and use that as a break point.

I'm wondering if that part of the code is deterministic. @nds can you comment please?

@sajvanderzeeuw
Copy link
Author

sajvanderzeeuw commented Apr 28, 2016

@martinghunt Which version of bio_assembly_improvement i should have installed, to get this to work? Or is this related to the version of Prodigal. Hope you can still provide me an answer on this. Thanks in advance!

@martinghunt
Copy link
Collaborator

Hi,

Sorry for the slow reply, took some digging to figure out what's going on. It's actually earlier on at the merge stage where things are different, not bio_assembly_refinement. Depending on code versions, unitig_97 doesn't always get circularized succesfully. It should be ~6kb long and then the prodigal gene is at 2906. When the circularization goes wrong, it ends up around 30kb long and then the prodigal gene is at 16227.

I just ran circlator 0.14.1 with spades 3.5.0 on my original data, and on the data in your tarball and I get the same results as the paper. The important file here is the circularize log:

$ cat 04.merge.circularise.log
[merge circularised] #Contig repetitive_deleted circl_using_nucmer circl_using_spades circularised
[merge circularised] unitig_96|quiver 0 1 0 1
[merge circularised] unitig_97|quiver 0 0 1 1

In a failed version, unitig_97 doesn't get circularized using the circular call from spades:
[merge circularised] #Contig repetitive_deleted circl_using_nucmer circl_using_spades circularised
[merge circularised] unitig_96|quiver 0 1 0 1
[merge circularised] unitig_97|quiver 0 1 0 1

The only differences I see python module in versions is:
pysam 0.8.3
pymummer 0.7.1
pyfastaq 3.12.1
which shouldn't make any difference. The only idea I have is that your circlator script is picking up a different version of the circlator module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants