Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mikdo pick throws this error and results in very few transcripts #415

Closed
aseetharam opened this issue Jul 31, 2021 · 12 comments
Closed

mikdo pick throws this error and results in very few transcripts #415

aseetharam opened this issue Jul 31, 2021 · 12 comments
Labels

Comments

@aseetharam
Copy link

aseetharam commented Jul 31, 2021

Hello,

I've been running Mikado on a number of plant genomes and recently upgraded it to the latest version. I'm using the Singulairty image (v2.3.0, centos) and all the steps until pick (configure, prepare, serialise), completed successfully. mikado pick was run as follows:

 mikado pick \
--configuration configuration.yaml \
--subloci-out mikado.subloci.gff3

and the pick.log has

2021-07-31 00:13:52,006 - main_logger - picker.py:295 - INFO - setup_logger - MainProcess - Mikado version: 2.3.0
2021-07-31 00:13:52,006 - main_logger - picker.py:297 - INFO - setup_logger - MainProcess - Command line: /usr/local/bin/mikado pick --configuration configuration.yaml --subloci-out mikado.subloci.gff3
2021-07-31 00:13:52,007 - main_logger - picker.py:302 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared.gtf
2021-07-31 00:13:52,007 - listener - picker.py:312 - WARNING - setup_logger - MainProcess - Current level for queue: INFO
2021-07-31 00:13:52,967 - main_logger - picker.py:238 - INFO - setup_shm_db - MainProcess - DB copied into /tmp/tmpr94cd5oh.db
2021-07-31 00:13:52,969 - listener - picker.py:102 - INFO - __init__ - MainProcess - Starting to analyse input file mikado_prepared.gtf
2021-07-31 00:13:52,969 - listener - picker.py:103 - INFO - __init__ - MainProcess - Random seed: 0
2021-07-31 00:13:53,475 - listener - picker.py:1033 - ERROR - __check_transcript - MainProcess - Superlocus superlocus:scaf_1mixed:66160-68372 failed with exception: None cannot be a node
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/picker.py", line 1025, in __check_transcript
    for stranded_locus in submit_locus(current_locus, counter):
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/picker.py", line 552, in _submit_locus
    return analyse_locus(slocus=slocus,
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/loci_processer.py", line 247, in analyse_locus
    slocus.load_all_transcript_data(engine=engine, session=session)
  File "/usr/local/lib/python3.9/site-packages/Mikado/loci/superlocus.py", line 706, in load_all_transcript_data
    remove_flag, new_transcripts = self.load_transcript_data(tid, data_dict)
  File "/usr/local/lib/python3.9/site-packages/Mikado/loci/superlocus.py", line 492, in load_transcript_data
    self.transcripts[tid].load_information_from_db(self.configuration,
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript.py", line 103, in load_information_from_db
    retrieval.load_information_from_db(self,
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 309, in load_information_from_db
    retrieve_from_dict(transcript, data_dict)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 362, in retrieve_from_dict
    load_orfs(transcript, candidate_orfs)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 62, in load_orfs
    candidate_orfs = find_overlapping_cds(transcript, candidate_orfs)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 436, in find_overlapping_cds
    graph = define_graph(orf_dictionary, inters=transcript.is_overlapping_cds)
  File "/usr/local/lib/python3.9/site-packages/Mikado/_transcripts/clique_methods.py", line 65, in define_graph
    graph.add_nodes_from(objects.keys())
  File "/usr/local/lib/python3.9/site-packages/networkx/classes/graph.py", line 581, in add_nodes_from
    raise ValueError("None cannot be a node")

this error is repeated numerous times, ending with:

2021-07-31 00:44:34,887 - listener - picker.py:992 - INFO - __submit_single_threaded - MainProcess - Finished chromosome scaf_97
2021-07-31 00:44:34,945 - queue_listener - picker.py:1006 - INFO - __submit_single_threaded - MainProcess - Final number of superloci: 41706
2021-07-31 00:44:35,002 - main_logger - picker.py:1090 - INFO - __call__ - MainProcess - Finished analysis of mikado_prepared.gtf

I'm stumped. Can you please help me fix this error? I think I can rule out any installation error. Since this is happening on all 11 genomes I've been running, you can also probably rule out problematic data. Any help will be greatly appreciated!

Thanks,

I'm editing to add more info:

grep -c ">" mikado_prepared.fasta
564895

and the final results:

awk '$3=="mRNA" {count++} END{print count}' mikado.loci.gff3
291
awk '$3=="gene" {count++} END{print count}' mikado.loci.gff3
210

This doesn't seem right.

@lucventurini
Copy link
Collaborator

Dear @aseetharam ,

Many thanks for your bug report. This does look like an incompatibility between Mikado and the current version of networx. To confirm this is the case, the best way would be to downgrade the library to the minimum required version (networx==2.3).

@ljyanesm might be able to have a look at it.

Kind regards

@aseetharam
Copy link
Author

aseetharam commented Jul 31, 2021

Thanks, @lucventurini, for the quick response! I rebuild the container with the recommended networkx version (by editing the environment.yaml file), but it did not help. I'm still getting the same error.

2021-07-31 16:21:08,770 - main_logger - picker.py:295 - INFO - setup_logger - MainProcess - Mikado version: 2.3.0
2021-07-31 16:21:08,770 - main_logger - picker.py:297 - INFO - setup_logger - MainProcess - Command line: /usr/local/bin/mikado pick --configuration configuration.yaml --subloci-out mikado.subloci.gff3
2021-07-31 16:21:08,770 - main_logger - picker.py:302 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared.gtf
2021-07-31 16:21:08,770 - listener - picker.py:312 - WARNING - setup_logger - MainProcess - Current level for queue: INFO
2021-07-31 16:21:14,549 - main_logger - picker.py:238 - INFO - setup_shm_db - MainProcess - DB copied into /tmp/tmp8f383myb.db
2021-07-31 16:21:14,667 - listener - picker.py:102 - INFO - __init__ - MainProcess - Starting to analyse input file mikado_prepared.gtf
2021-07-31 16:21:14,667 - listener - picker.py:103 - INFO - __init__ - MainProcess - Random seed: 0
2021-07-31 16:21:15,250 - listener - picker.py:1033 - ERROR - __check_transcript - MainProcess - Superlocus superlocus:scaf_1mixed:66160-68372 failed with exception: None cannot be a node
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/picker.py", line 1025, in __check_transcript
    for stranded_locus in submit_locus(current_locus, counter):
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/picker.py", line 552, in _submit_locus
    return analyse_locus(slocus=slocus,
  File "/usr/local/lib/python3.9/site-packages/Mikado/picking/loci_processer.py", line 247, in analyse_locus
    slocus.load_all_transcript_data(engine=engine, session=session)
  File "/usr/local/lib/python3.9/site-packages/Mikado/loci/superlocus.py", line 706, in load_all_transcript_data
    remove_flag, new_transcripts = self.load_transcript_data(tid, data_dict)
  File "/usr/local/lib/python3.9/site-packages/Mikado/loci/superlocus.py", line 492, in load_transcript_data
    self.transcripts[tid].load_information_from_db(self.configuration,
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript.py", line 103, in load_information_from_db
    retrieval.load_information_from_db(self,
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 309, in load_information_from_db
    retrieve_from_dict(transcript, data_dict)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 362, in retrieve_from_dict
    load_orfs(transcript, candidate_orfs)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 62, in load_orfs
    candidate_orfs = find_overlapping_cds(transcript, candidate_orfs)
  File "/usr/local/lib/python3.9/site-packages/Mikado/transcripts/transcript_methods/retrieval.py", line 436, in find_overlapping_cds
    graph = define_graph(orf_dictionary, inters=transcript.is_overlapping_cds)
  File "/usr/local/lib/python3.9/site-packages/Mikado/_transcripts/clique_methods.py", line 65, in define_graph
    graph.add_nodes_from(objects.keys())
  File "/usr/local/lib/python3.9/site-packages/networkx/classes/graph.py", line 581, in add_nodes_from
    raise ValueError("None cannot be a node")
ValueError: None cannot be a node

Thanks!

@lucventurini
Copy link
Collaborator

Dear @aseetharam

Thank you for checking. Then my other hypothesis is that you might have ORFs without an ID in your dataset.

May I ask how you produced the ORFs (including the version of the program) and a snippet of the ORF file you used, please?

@aseetharam
Copy link
Author

aseetharam commented Jul 31, 2021

Hi @lucventurini:
I used TransDeocoder (v5.5.0) for predicting the ORFs and the snippet bed file is attached: td-snippet.bed.txt. The fourth field does look weird with ~, but I also see that you now recommend using gff3 file and not bed for ORFs, is this what caused this issue? Either way, I'll try again with gff3 and/or cleaning up the name and get back to you. Thanks again for helping me with this!

@aseetharam
Copy link
Author

aseetharam commented Aug 1, 2021

Okay, just wanted to update you. I tried the gff3 file from Transdecoder (orfs) as well as new orfs generated using prodigal, but neither of them worked (same error). If you would like me to try anything, please don't hesitate!

Thanks,

@ljyanesm
Copy link
Collaborator

ljyanesm commented Aug 2, 2021

Dear @aseetharam,

Would it be possible for you to make a minimal example (maybe just extracting one of the failing loci) and share that with us?

@lucventurini
Copy link
Collaborator

No need @ljyanesm , this happened right now on my machine using the sample data :-/

Something has broken somewhere.

@lucventurini
Copy link
Collaborator

@aseetharam FYI this means we do have a confirmation on the bug, fixing it ASAP.

@lucventurini
Copy link
Collaborator

@ljyanesm the problem is in serialise. For some reason it is reading in the ORFs without the name field being populated.

This breaks Mikado downstream as that field would be used for ordering the ORFs.

Setting it as a bug.

@aseetharam
Copy link
Author

@ljyanesm @lucventurini, thanks! Looking forward for the fix!

@ljyanesm
Copy link
Collaborator

ljyanesm commented Aug 5, 2021

Dear @aseetharam,

The original issue you've posted was fixed with the commit referenced here. There are a couple of other issues we are currently working on, as soon as these are resolved I will merge and tag a new release.

Thank you for reporting the issue, and your patience.

Luis

@lucventurini
Copy link
Collaborator

Dear @aseetharam ,

The new version of Mikado (2.3.1) that we just released contains the fix for this issue. It also contains many other bug fixes and general improvements that should make Mikado much faster for dense loci. The new version is already present in PyPI and should be uploaded to Conda soon.

We are closing the issue for now, but please feel free to reopen it if you encounter any issue.

Kind regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants