Skip to content

Commit

Permalink
Improvement on EI-CoreBioinformatics#131: now we are using a graph-ba…
Browse files Browse the repository at this point in the history
…sed function, rather than a for cycle, to find the missing loci. This also ensures coherence in terms of the overlapping parameters.
  • Loading branch information
lucventurini committed Oct 5, 2018
1 parent 2aa7c6d commit 81b550e
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 15 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Version 1.2.5
# Version 1.3

One of the major highlights of this release is the completion of the "padding" functionality.
Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last _Arabidopsis thaliana_ annotation release).
Expand All @@ -12,6 +12,8 @@ Bugfixes and improvements:
- Fixed [#127](https://github.com/lucventurini/mikado/issues/127): previously, Mikado _prepare_ only considered cDNA coordinates when determining the redundancy of two models. In some edge cases, two models could be identical but have a different ORF called. Now Mikado will also consider the CDS before deciding whether to discard a model as redundant.
- [#129](https://github.com/lucventurini/mikado/issues/129): Mikado is now capable of correctly padding the transcripts so to uniform their ends in a single locus. This will also have the effect of trying to enlarge the ORF of a transcript if it is truncated to begin with.
- [#130](https://github.com/lucventurini/mikado/issues/130): it is now possible to specify a different metric inside the "filter" section of scoring.
- [#131](https://github.com/lucventurini/mikado/issues/131): in rare instances, Mikado could have missed loci if they were lost between the sublocus and monosublocus stages. Now Mikado implements a basic backtracking recursive algorithm that should ensure no locus is missed.
- [#132](https://github.com/lucventurini/mikado/issues/132)

# Version 1.2.4

Expand Down
4 changes: 2 additions & 2 deletions Mikado/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
__title__ = "Mikado"
__author__ = 'Luca Venturini'
__license__ = 'GPL3'
__copyright__ = 'Copyright 2015-2019 Luca Venturini'
__version__ = "1.2.5"
__copyright__ = 'Copyright 2015-2020 Luca Venturini'
__version__ = "1.3"

__all__ = ["configuration",
"exceptions",
Expand Down
35 changes: 23 additions & 12 deletions Mikado/loci/superlocus.py
Original file line number Diff line number Diff line change
Expand Up @@ -1142,20 +1142,31 @@ def define_loci(self):

def __find_lost_transcripts(self):

if self.loci_defined is True:
return
cds_only = self.json_conf["pick"]["clustering"]["cds_only"]
# simple_overlap = self.json_conf["pick"]["run_options"]["monoloci_from_simple_overlap"]
cdna_overlap = self.json_conf["pick"]["clustering"]["min_cdna_overlap"]
cds_overlap = self.json_conf["pick"]["clustering"]["min_cds_overlap"]

t_graph = self.define_graph(self.transcripts,
inters=MonosublocusHolder.is_intersecting,
cds_only=cds_only,
logger=self.logger,
min_cdna_overlap=cdna_overlap,
min_cds_overlap=cds_overlap,
simple_overlap_for_monoexonic=False)

loci_transcripts = itertools.chain(*[{self.loci[_].transcripts.keys()} for _ in self.loci])
loci_transcripts = set()
for locus in self.loci.values():
loci_transcripts.update(set([_ for _ in locus.transcripts.keys()]))

for tid in set.difference({self.transcripts.keys()}, loci_transcripts):
found = False
for lid in self.loci:
if MonosublocusHolder.in_locus(self.loci[lid], self.transcripts[tid]):
found = True
break
else:
continue
if found is True:
not_loci_transcripts = set.difference({_ for _ in self.transcripts.keys()}, loci_transcripts)

if not not_loci_transcripts:
return

for tid in not_loci_transcripts:
neighbours = set(t_graph.neighbors(tid))
if set.intersection(neighbours, loci_transcripts):
continue
else:
self.__lost.update({tid: self.transcripts[tid]})
Expand Down
12 changes: 12 additions & 0 deletions docs/Algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,13 @@ For example, this is a snippet of a scoring section:
end_distance_from_junction:
filter: {operator: lt, value: 55}
rescaling: min
non_verified_introns_num:
rescaling: max
multiplier: -10
filter:
operator: gt
value: 1
metric: exons_num
Using this snippet as a guide, Mikado will score transcripts in each locus as follows:
Expand All @@ -228,6 +235,11 @@ Using this snippet as a guide, Mikado will score transcripts in each locus as fo
* Assign a full score (**two points**, as a multiplier of 2 is specified) to transcripts that have a total amount of CDS bps approximating 80% of the transcript cDNA length (*combined_cds_fraction*)
* Assign a full score (one point, as no multiplier is specified) to transcripts that have a 5' UTR whose length is nearest to 100 bps (*five_utr_length*); if the 5' UTR is longer than 2,500 bps, this score will be 0 (see the filter section)
* Assign a full score (one point, as no multiplier is specified) to transcripts which have the lowest distance between the CDS end and the most downstream exon-exon junction (*end_distance_from_junction*). If such a distance is greater than 55 bps, assign a score of 0, as it is a probable target for NMD (see the filter section).
* Assign a maximum penalty (**minus 10 points**, as a **negative** multiplier is specified) to the transcript with the highest number of non-verified introns in the locus.
* Again, we are using a "filter" section to define which transcripts will be exempted from this scoring (in this case, a penalty)
* However, please note that we are using the keyword **metric** in this section. This tells Mikado to check a *different* metric for evaluating the filter. Nominally, in this case we are excluding from the penalty any *monoexonic* transcript. This makes sense as a monoexonic transcript will never have an intron to be confirmed to start with.

.. note:: The possibility of using different metrics for the "filter" section is present from Mikado 1.3 onwards.

.. _Metrics:

Expand Down

0 comments on commit 81b550e

Please sign in to comment.