-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to allow requirements and not_fragmentary checks to be applied to models marked as reference #240
Comments
Hi @swarbred, yes, I think it is feasible. |
…tch for both `pick` and `configure`.
Thanks @lucventurini @swarbred, I have now installed
|
Have just tested this with version 89eca18 and this looks to have an issue (or i'm not understanding how it should work) My test data set is at /tgac/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-2.0rc4/annotation_run2/mikado-2.0rc6_89eca18_CBG_TEST_SET I ran two picks the only difference being check_references being set as either true or false mikado pick -v --mode nosplit --seed 10 --only-reference-update --procs 1 --json-conf mikado.configuration.run5-test_check_references_true.yaml --subloci-out mikado.subloci.gff3 -od mikado_pick_check_references_true mikado pick -v --mode nosplit --seed 10 --only-reference-update --procs 1 --json-conf mikado.configuration.run5-test_check_references_false.yaml --subloci-out mikado.subloci.gff3 -od mikado_pick_check_references_false The false run outputs models in the mikado.loci.gff3 file the true version has no models. So I was expecting the true run to apply the requirements to models marked as reference, some filtering should occur but not all models should be removed. Looking at the pick log and it looks like models were removed as the gene had no reference transcripts but the model below is marked as a reference (and was output in the "false" run). So perhaps check_references is not working as expected when set to true.
|
Hi @swarbred, I was planning to investigate that. In the meantime, da0265b contains both the fix for the reference check problem and should fix the other issues with the alias. I am really sorry about all of this - I did some changes that should have been transparent to the backbone, but apparently are not. |
…that caused boolean values to be converted into integers.
@lucventurini I'm losing track of versions, if we are going to test everything works we should pull 5d9794a ? I had a go at installing using singularity but am getting an error @gemygk can educate me how to do this on Monday :-) |
Hi @swarbred, Sorry for all these commits. Basically when I was about at rc4 various people noticed how Mikado was slow on certain datasets, and that pushed me to do many changes under the hood to make it work with big datasets of millions of transcripts. It was worth it but I fear it pushed the software back into a |
Using the test region at /ei/workarea/group-ga/Projects/CB-GENANNO-444_Myzus_persicae_clone_O_v2_annotation/Analysis/mikado-2.0rc4/annotation_run2/mikado-2.0rc6_89eca18_CBG_TEST_SET running version mikado-2.0rc6_89eca18_CBG we see models in the output that are identical to other outputted models I've just tested this with mikado-2.0_rc1 and this is correct running with mikado-2.0rc6_a9ef516 and mikado errors (with an unrelated issue)
|
Hi @swarbred, I thought I had solved that specific, unrelated bug earlier last week .. I will have to double check. Please give me ~10 minutes. |
Hi @swarbred, the bug is in the GTF file. If you inspect it you will see:
That's wrong, those "has_start_codon", "has_stop_codon" and "is_reference" should be In 5744533 I made the code resilient to that. In the meantime, I did correct the input GTF and reran with the same mikado version (a9ef516) and the job finished successfully. See
|
PS: the secondary problem - double models being retained after padding - is still there though. |
f89b0f2 should have solved the problem. There were three separate issues: 1- Reference models were never marked as redundant - even when ending up with two identical reference models. f89b0f2 fixes all of this, so that now in
there are no duplicated models. |
Hi @lucventurini Yes we will take a look at this next week as like you we want to tie up the release. On a separate note while I remember for ease of testing it would be handy to be able to give a list of comma separated scaffold:region..region via the command line to restrict pick to specified regions (easier than cutting the gtf down). |
Great thanks
I will have a look at this, it should not be too difficult to do. |
…ovide target regions for analysis. This should speed up testing.
Hi @swarbred , @gemygk , commit be2a884 introduces the option you were asking for. Specifically, using the The regions should be in the form of Please let me know how you get along with this new function. |
Thanks @lucventurini will install this and test, if I'm going to download and install to test this and other changes should I use be2a884 or something else? |
The commit from yesterday (0723988) has cleared up a bit the code for the configuration files and added a new simplified configuration format, TOML (https://github.com/toml-lang/toml; see #239). I would suggest using the latest release but be2a884 should be functionally equivalent. Best |
Thanks @lucventurini @gemygk installed as mikado-2.0rc6_0723988_CBG |
Great. Please do not hate me, but this morning I have also finished working on #237, with ab11a75: the code in that branch now splits models with too-long introns as the default. I would recommend testing that separately as the only thing in that branch that really has changed from the code you just installed is the corner case in |
A quick notice to please check and update on this issue as well when testing the |
Closing for now as no further report of problems since changes in November 2019. |
…tch for both `pick` and `configure`.
* Fix EI-CoreBioinformatics#240, EI-CoreBioinformatics#243 * Solved a bug that caused boolean values to be converted into integers for `pick`.
…ovide target regions for analysis. This should speed up testing.
Hi @lucventurini
In GMC we currently make use of the --only-reference-update flag as we have labels (sets of models) marked as reference (e.g. multiple sets of augustus models + subset of mikado models) but also other sets e.g. say the full mikado output that we only include for the purposes of adding splice variants (so marked as not reference). To avoid these models being selected as a primary models we give them a low base score and run with the --only-reference-update flag so loci based on these transcripts alone are excluded.
This works but it means that the requirements and not_fragmentary checks are not applied to the models marked as reference. Most of the time this is the desired behaviour but in our case I would like these checks to be applied to the reference models.
I wondered if we could add an option to override the default behaviour and allow requirements and not_fragmentary checks to be applied to reference models?
The text was updated successfully, but these errors were encountered: