Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in loading external metrics #243

Closed
lucventurini opened this issue Oct 18, 2019 · 28 comments
Closed

Error in loading external metrics #243

lucventurini opened this issue Oct 18, 2019 · 28 comments
Assignees
Milestone

Comments

@lucventurini
Copy link
Collaborator

Error found on the latest release:

2019-10-18 10:15:27,974 - main_logger - picker.py:352 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared.gtf
2019-10-18 10:15:27,974 - main_logger - picker.py:354 - INFO - setup_logger - MainProcess - Command line: /usr/local/bin/mikado pick --mode nosplit --seed 10 --only-reference-update --procs 32 --json-conf mikado.configuration.run3.yaml --subloci-out mikado.subloci.gff3 -od mikado-2.0rc6_CBG_pick_run3
2019-10-18 10:15:27,975 - listener - picker.py:371 - WARNING - setup_logger - MainProcess - Current level for queue: INFO
2019-10-18 10:15:28,031 - listener - picker.py:106 - INFO - __init__ - MainProcess - Random seed: 10
2019-10-18 10:15:28,523 - listener - picker.py:737 - INFO - __submit_multi_threading - MainProcess - Starting Mikado with multiple processes, temporary directory:
       /ei/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-2.0rc4/annotation_run2/mikado-2.0rc6_CBG_pick_run3/mikado_pick_tmpggu750yk
2019-10-18 10:15:40,789 - scaffold_1:104047-104464 - loci_processer.py:318 - ERROR - analyse_locus - LociProcesser-8 - 'external.mikado_all_aF1'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/Mikado/loci/abstractlocus.py", line 1338, in _calculate_score
    metric = self._metrics[tid][param]
KeyError: 'external.mikado_all_aF1'

The database was checked, and the data was actually in:

sqlite> select * from external_sources;
1|mikado_all_nF1|float|1
2|mikado_all_jF1|float|1
3|mikado_all_eF1|float|1
4|mikado_all_aF1|float|1
[...]

and

sqlite> select count(*) from external where source_id in (select source_id from external_sources where source like "%mikado_all_aF1%");
222384
@lucventurini lucventurini added this to the 2.0 milestone Oct 18, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 18, 2019
@swarbred
Copy link
Collaborator

So I'm getting an error with mikado-2.0rc6_19f3b2f5_CBG

Same data completed ok with mikado-2.0_rc1

only change is
check_references: true
added to the run_options:

out_mikado.pick_run4.24247156.log

@lucventurini
Copy link
Collaborator Author

Ouch. This is again #241. I will merge the bug fix of that branch here, so you can test again. I did not hear from @Xiaofei-git regarding the issue yet, so I don't know whether my fix funcioned, but I have high hopes it actually did.

I will merge that branch into this tomorrow, so that you can test.

@lucventurini
Copy link
Collaborator Author

Hi @swarbred , @gemygk , the code solving the bug has been merged into issue-240 by 89eca18. This version of the code should complete successfully.

@gemygk
Copy link
Collaborator

gemygk commented Oct 21, 2019

Thanks @lucventurini

@swarbred , I have installed it as mikado-2.0rc6_89eca18_CBG on our cluster

@Xiaofei-git
Copy link

@gemygk @lucventurini, how to install the new version with solved code?

I tried to do inside of mikado folder, but I got this. I installed mikado by downloading source and using pip3 to install.

$ git pull
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

@gemygk
Copy link
Collaborator

gemygk commented Oct 21, 2019

Hi @Xiaofei-git ,

Below is what I do to get the latest commit:

git clone https://github.com/lucventurini/mikado.git
cd mikado
git checkout 89eca18
...
...

Then carry out the installation as normal

@lucventurini
Copy link
Collaborator Author

lucventurini commented Oct 21, 2019

Dear @Xiaofei-git, this should function:

git clone https://github.com/EI-CoreBioinformatics/mikado.git;
cd mikado;
git remote add lucventurini https://github.com/lucventurini/mikado.git;
git pull lucventurini;
git checkout 89eca18;  # This branch has both the fix for your issue and the issue signalled by @swarbred 
conda env create -f environment.yml -n mikado2;
conda activate mikado2;
python setup.py bdist_wheel;
pip install dist/*whl

I hope this helps.

Kind regards

@Xiaofei-git
Copy link

I followed the steps, but I got the error as below

$ git checkout 89eca18
error: pathspec '89eca18' did not match any file(s) known to git

@Xiaofei-git
Copy link

Thanks a lot for both of you! @gemygk @lucventurini

@lucventurini
Copy link
Collaborator Author

Dear @Xiaofei-git , you are absolutely right. Please follow the amended commands here (I also edited the comment before):

git clone https://github.com/EI-CoreBioinformatics/mikado.git;
cd mikado;
git remote add lucventurini https://github.com/lucventurini/mikado.git;
git pull lucventurini;
git checkout 89eca18;  # This branch has both the fix for your issue and the issue signalled by @swarbred 
conda env create -f environment.yml -n mikado2;
conda activate mikado2;
python setup.py bdist_wheel;
pip install dist/*whl

@Xiaofei-git
Copy link

Installed, thanks a lot!

@swarbred
Copy link
Collaborator

swarbred commented Oct 23, 2019

Will look closer later but the latest run did not complete @lucventurini @gemygk

mikado/2.0rc6_89eca18_CBG is sourced from /ei/software/cb location
Usage:
mikado --help
2019-10-21 16:45:23,042 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-10-21 16:45:23,042 - main - init.py:121 - ERROR - main - MainProcess - 'mikado.scaffold_1G83.2'
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/_merge_loci_utils.py", line 54, in manage_index
gene_counter)
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/_locus_line_creator.py", line 29, in _create_locus_lines
locus_metrics_rows = [x for x in stranded_locus.print_loci_metrics()]
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/_locus_line_creator.py", line 29, in
locus_metrics_rows = [x for x in stranded_locus.print_loci_metrics()]
File "/usr/local/lib/python3.7/site-packages/Mikado/loci/superlocus.py", line 1135, in print_loci_metrics
for row in self.loci[locus].print_metrics():
File "/usr/local/lib/python3.7/site-packages/Mikado/loci/abstractlocus.py", line 941, in print_metrics
metrics = self._metrics[tid]
KeyError: 'mikado.scaffold_1G83.2'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Mikado/init.py", line 106, in main
args.func(args)
File "/usr/local/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 205, in pick
creator()
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1124, in call
self._parse_and_submit_input()
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1090, in _parse_and_submit_input
self.__submit_multi_threading()
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/picker.py", line 792, in __submit_multi_threading
tempdir=tempdir)
File "/usr/local/lib/python3.7/site-packages/Mikado/picking/loci_processer.py", line 99, in merge_loci
500)):
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 325, in
return (item for chunk in result for item in chunk)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
KeyError: 'mikado.scaffold_1G83.2'
Traceback (most recent call last):
File "/usr/local/lib/python3.7/weakref.py", line 648, in _exitfunc
f()
File "/usr/local/lib/python3.7/weakref.py", line 572, in call
return info.func(*info.args, **(info.kwargs or {}))
File "/usr/local/lib/python3.7/tempfile.py", line 795, in _cleanup
_shutil.rmtree(name)
File "/usr/local/lib/python3.7/shutil.py", line 498, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/usr/local/lib/python3.7/shutil.py", line 496, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/ei/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-2.0rc4/annotation_run2/mikado-2.0rc6_19f3b2f5_CBG_pick_run4/mikado_pick_tmp3s_ece3d'
Command exited with non-zero status 1
Command being timed: "mikado pick --mode nosplit --seed 10 --only-reference-update --procs 32 --json-conf mikado.configuration.run4.yaml --subloci-out mikado.subloci.gff3 -od mikado-2.0rc6_19f3b2f5_CBG_pick_run4"
User time (seconds): 9472.36
System time (seconds): 4168.11
Percent of CPU this job got: 839%
Elapsed (wall clock) time (h:mm:ss or m:ss): 27:04.33
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 293800
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 300
Minor (reclaiming a frame) page faults: 94612867
Voluntary context switches: 30345059
Involuntary context switches: 4142550
Swaps: 0
File system inputs: 228422390
File system outputs: 12289832
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1

see
/tgac/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-2.0rc4/annotation_run2/out_mikado.pick_run4.24274011.log

and run directory
mikado-2.0rc6_19f3b2f5_CBG_pick_run4

@lucventurini
Copy link
Collaborator Author

Ok, i will have a look and relaunch when I have a fix.
Thank you

@lucventurini
Copy link
Collaborator Author

Hi @swarbred , @gemygk , I found the problem. The transcript:

scaffold_1 Mikado_All_gold mRNA 485967 488473 . - . gene_id "Mikado_All_gold_mikado.scaffold_1G76.1.gene"; transcript_id "Mikado_All_gold_mikado.scaffold_1G76.1"; Name "Mikado_All_gold_mikado.scaffold_1G76.1"; alias "sex_morph_N.stringtie_sex_morph_N_str.11.1"; primary "True"; canonical_number "2"; canonical_proportion "1.0"; canonical_junctions "1,2"; has_start_codon "True"; has_stop_codon "True"; is_reference "True";

has as ID: Mikado_All_gold_mikado.scaffold_1G76.1
and as alias from the previous mikado run: sex_morph_N.stringtie_sex_morph_N_str.11.1

This confuses Mikado at the end as it loses track of the transcript. I will have to have a look at how to avoid this bug.

@lucventurini
Copy link
Collaborator Author

The following patch solves part of the problem:

--- /ei/software/testing/mikado/20191023_19f3b2f5/src/mikado/Mikado/loci/locus.py	2019-10-23 14:05:48.110423830 +0100
+++ /ei/software/testing/mikado/20191023_19f3b2f5/x86_64/lib/python3.7/site-packages/Mikado/loci/locus.py	2019-10-23 15:21:05.901354444 +0100
@@ -1088,6 +1088,7 @@
         if self.scores_calculated is True:
             for tid in mapper:
                 self.scores[mapper[tid]] = self.scores.pop(tid)
+                self._metrics[mapper[tid]] = self._metrics.pop(tid)
         if self.metrics_calculated is True:
             for index in range(len(self.metric_lines_store)):
                 self.metric_lines_store[index]["tid"] = mapper[self.metric_lines_store[index]["tid"]]

The issue now is that there are transcripts that might have the same alias as, for example, they are the same PacBio read mapped in the same location but from two different sets (mikado all vs mikado pacbio). Solving this now.

@lucventurini
Copy link
Collaborator Author

PS: to be clear, the issue is due to situations like:

scaffold_1	Mikado_All_gold	mRNA	993942	999178	.	+	.	gene_id "Mikado_All_gold_mikado.scaffold_1G147.1.gene"; transcript_id "Mikado_All_gold_mikado.scaffold_1G147.1"; Name "Mikado_All_gold_mikado.scaffold_1G147.1"; alias "polished_LQ_sampleWzA5U6ZI|cb7942_c17/f1p3/5238.mrna1"; primary "True"; split "True"; has_start_codon "True"; has_stop_codon "True"; is_reference "True";
scaffold_1	Mikado_PacBio_gold	mRNA	993942	999178	.	+	.	gene_id "Mikado_PacBio_gold_mikado.scaffold_1G4.1.gene"; transcript_id "Mikado_PacBio_gold_mikado.scaffold_1G4.1"; Name "Mikado_PacBio_gold_mikado.scaffold_1G4.1"; alias "polished_LQ_sampleWzA5U6ZI|cb7942_c17/f1p3/5238.mrna1"; primary "True"; split "True"; has_start_codon "True"; has_stop_codon "True"; is_reference "True";

Same alias but different ID.

@swarbred
Copy link
Collaborator

Hi @lucventurini

Question, why is having the same alias an issue now but wasn't previously, presumably there has been a code change that relates to this. We would have had this issue in other integration runs and this same dataset is fine on the earlier mikado version.

@lucventurini
Copy link
Collaborator Author

Hi @swarbred, I will have to track it down. Currently I'm puzzled myself. Hopefully I will be able to retrace the origin of the bug quickly.

lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 24, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 24, 2019
@lucventurini
Copy link
Collaborator Author

Hi @swarbred , @gemygk , I think I figured it out ... in some cases Mikado did not provide the correct scores when passing the data back to the original process (only in multiprocessing mode). That caused an error.

It should be fixed in da0265b.

@swarbred
Copy link
Collaborator

HI @lucventurini
So this wasn't to do with the alias clash? I just started a run with the alias removed using 19f3b2f (the version that gave the above error) but if the alias wasn't the cause then I assume this will also fail.

@lucventurini
Copy link
Collaborator Author

Hi @swarbred, for some reason the bug seemed to be triggered only when there was an alias clash. So I think your run might still complete without errors. I cannot guarantee it though.

To be more technical: I am not completely sure about it, but I think it might have been triggered when the padding ended up removing a transcript from the locus as redundant ... that left some data around in the locus object that caused trouble (specifically in the dictionaries with the data related to metrics and scores). I should have fixed that, and forced the Locus objects to ensure they give back the scoring information to the main thread.

@lucventurini
Copy link
Collaborator Author

PS: Hi @swarbred , if the job you were waiting for was 24321575, it seems to have completed successfully. As I was writing above, for some reason the problem was triggered by the alias. I have an idea of the culprit (ie a capital instead of lowercase letter, "Alias" instead of "alias", in a key function).

But your data should be ready :-)

lucventurini added a commit to lucventurini/mikado that referenced this issue Oct 25, 2019
…that caused boolean values to be converted into integers.
@lucventurini lucventurini mentioned this issue Oct 25, 2019
lucventurini added a commit that referenced this issue Oct 25, 2019
* Fix #240, #243
* Solved a bug that caused boolean values to be converted into integers for `pick`.
@Xiaofei-git
Copy link

Xiaofei-git commented Nov 7, 2019 via email

@lucventurini
Copy link
Collaborator Author

Dear @Xiaofei-git,
my apologies, I have not had the time to look at the issue yet. I am currently very busy with a grant proposal to ensure the continuous development of Portcullis and Mikado (deadline next week) and I do not have much time to look at it in detail before then.

My suggestion for the time being is to put Portcullis in a path without soft links that might disrupt its launch. I will be reviewing the issue most probably after next Wednesday, after submitting the grant request.

Kind regards

Luca Venturini

@Xiaofei-git
Copy link

Got it, thank you so much!

@Xiaofei-git
Copy link

Xiaofei-git commented Nov 7, 2019

Another issue is with mikado. Our admin tried to install it on our cluster, but got error to check this out:

$ git remote add lucventurini https://github.com/lucventurini/mikado.git;
$ git pull lucventurini;
remote: Enumerating objects: 198, done.
remote: Counting objects: 100% (198/198), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 225 (delta 181), reused 185 (delta 172), pack-reused 27
Receiving objects: 100% (225/225), 74.80 KiB | 18.70 MiB/s, done.
Resolving deltas: 100% (181/181), completed with 71 local objects.
From https://github.com/lucventurini/mikado

  • [new branch] issue-136 -> lucventurini/issue-136
  • [new branch] issue-237 -> lucventurini/issue-237
  • [new branch] issue-239 -> lucventurini/issue-239
  • [new branch] master -> lucventurini/master
    You asked to pull from the remote 'lucventurini', but did not specify
    a branch. Because this is not the default configured remote
    for your current branch, you must specify a branch on the command line.
    $ git checkout 89eca18;
    error: pathspec '89eca18' did not match any file(s) known to git

Thank you so much!

@lucventurini
Copy link
Collaborator Author

Dear @Xiaofei-git , I merged the branch that contained 89eca18 into the master branch a couple of weeks ago. As I did it, I squashed all the changes from that branch into a single commit.

As such, I would recommend using the master branch as it is.

I hope this helps.

Thank you

@Xiaofei-git
Copy link

Got it, thank you so much!

lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Fix EI-CoreBioinformatics#240, EI-CoreBioinformatics#243
* Solved a bug that caused boolean values to be converted into integers for `pick`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants