-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range and AssertionError #186
Comments
Hi @gemygk, would you be able to repeat the run with the latest container I uploaded yesterday? I possibly have already solved this bug, but in case I have not, I'll try to fix it today. |
Hi @lucventurini, Sure, I will test it on just one scaffold and update you. Cannot test it on the full run though, as the full run has already taken >5days for the pick stage alone (with 32 threads). |
Hi @gemygk ,
That is really strange and worrying. I will have a look at it. |
I think it is the depth of transcripts we have at a locus causing these long runtimes. |
Hi @lucventurini, I am getting another error now
The error that I am getting is below:
|
Hi @gemygk , it is due to the fact that you are launching Mikado from a different folder and therefore the soft link is not valid ("../plant.yaml"). |
@lucventurini , ah I see your point. My mistake of not changing the scoring file location in the Mikado configuration file. |
Out of curiosity, by the way, how did we end up with a region ( I am not surprised at Mikado having difficulties in managing such a huge amount of data! I would have to investigate where the choke is, but this is some orders of magnitude bigger than what I wrote the program for ... |
"Out of curiosity, by the way, how did we end up with a region (scaffold_2:=828029..24085450) that has a whopping 167,901 transcripts?" @lucventurini had a look with @gemygk and this is due to our pacbio gmap alignments having some very large introns 1.8Mb. This is even though in gmap we set intron sizes that are max 50kb for middle introns. Looking at the gmap parmeters there is a --split_large_intron option that indicates that gmap will generate alignments with introns over the max middle intron settings unless this is also set. While we have max intron settings in the requirements section of pick these are applied after the superloci construction I assume. It might be useful to add to the prepare a max intron size (we have a min cdna size already) so that we would remove these at the prepare stage. The default for this would be large 1Mb (suitable for mamalian genomes) but that would at least filter out the most problematic alignments and avoid users having similar issues. |
@swarbred agreed. This should also avoid other issues as well. |
Hi @swarbred , this should have been implemented in the latest commit (a131b94). I have put a generous default value of 1 million bps. prepare:
max_intron_length: 50000 I would recommend redoing the whole M. persicae analysis with this parameter in place, to be honest, given the spurious alignments. |
"I would recommend redoing the whole M. persicae analysis with this parameter in place, to be honest, given the spurious alignments." Thanks @lucventurini yes we were going to just filter the prepare output manually that way we dont need to redo the blast / orf prediction. I assume it shouldn't be an issue having say orfs loaded for the seralise step that are not in the prepare output that is then passed to pick |
No, absolutely, it should not pose any problem at all. Please let me know how it goes. |
Hi @gemygk , @swarbred , issue no. 1 (Mikado taking forever and crashing), it seems that the filtering did the trick. Regarding issue no.2, there are still some transcripts for which the splitting mechanism seems to fail with the |
This particular error seems to be triggered by a specific instance - ORFs assigned by the caller to the negative strand, that are incorrectly assigned a zero phase. Eg:
Here the ORF found by prodigal has a GTG start, which is discarded by Mikado; the ORF is consequently enlarged until the end of the transcript; and instead of it being assigned a phase of 2, it was assigned a phase of 0. |
Hi @gemygk , I understand that Mikado serialise took an extremely long time for your data on M. persicae, which is due to the poor parsing of XMLs, looking at the logs. I will see what can be done about that at a later date. |
Hi @gemygk, The running time was of about 6 hours in total. This is not a free lunch, though, as the total memory requirement increased to ~40GB (indeed, I had to relaunch multiple times while fine tuning the parameters, as memory increased too much). See:
and Job ID I am now analysing the same run using the new database (Job ID |
Hi @lucventurini, Thanks for the update. Yes, will keep monitoring. |
No error found in either the |
* Fix #189 * Fix #186 * #183: added static seed from CLI for pick. * #186: introduced a maximum intron length parameter for mikado prepare (prepare/max_intron_length), with a default value of 1M bps and a minimum value of 20. * #186: there was a very serious bug in the evaluation of negative truncated ORFs, which potentially led to a lot of them being called incorrectly at the serialisation stage. Refactored the function responsible for the mishap and added a unit-test which confirmed fixing of the bug.
…ioinformatics#191) * Fix EI-CoreBioinformatics#189 * Fix EI-CoreBioinformatics#186 * EI-CoreBioinformatics#183: added static seed from CLI for pick. * EI-CoreBioinformatics#186: introduced a maximum intron length parameter for mikado prepare (prepare/max_intron_length), with a default value of 1M bps and a minimum value of 20. * EI-CoreBioinformatics#186: there was a very serious bug in the evaluation of negative truncated ORFs, which potentially led to a lot of them being called incorrectly at the serialisation stage. Refactored the function responsible for the mishap and added a unit-test which confirmed fixing of the bug.
Hi @lucventurini,
The Mikado pick stage is giving me some errors for the version - mikado-20190610_94160dd.
Please see below the error and logs.
CMD:
Mikado pick command:
Pick Log:
WD:
ERROR:
Mikado did not generate any files after - Jun 18 20:26, so is Mikado hanging at the moment?
In addition, there is one more error that I can see:
Can you please look into this?
Thanks,
Gemy
The text was updated successfully, but these errors were encountered: