[Improvement] Check that coding ASEs share the frame with the primary transcript #134

lucventurini · 2018-10-12T15:33:05Z

Currently, Mikado only performs a CDS overlap check to make sure that two transcripts are compatible as ASEs. However, we do not check whether they actually encode a compatible protein. This requires to calculate the CDS codons and verify that at least some of them are in common.

As it is probably an expensive operation, we should not call upon it until the last possible moment (ie during the ASE validation step).

…rimary transcript CDS length as the denominator (not the minimum between the two compared transcripts).

…sis of the shortest CDS in the pair) for #134

lucventurini · 2018-10-15T19:20:23Z

We currently have to decide whether we will calculate the CDS overlap for ASEs based on the shorter between the two CDSs or whether to always use the primary transcript as benchmark. Currently implemented as a switch and keeping the old behaviour as default; this cannot be affected by the outside.

lucventurini · 2018-10-16T09:06:50Z

The current method for calculating the frames is too expensive. A better way would do the following with two exons:

check whether they are overlapping
if they are overlapping, sort them, then check whether (downstream + phase)-(upstream + phase) %3==0

Of course downstream/upstream have to be defined according to the stand.

…overage and documentation.

lucventurini · 2018-10-16T15:49:10Z

Feature implemented and tested. Closing.

…ed 'in-frame' if at least one of their exons is in-frame. Previously, the only contribution to cds_overlap was given by in-frame CDS segments, which is probably too restrictive.

…uced by default to 50% (again, 75% was probably too restrictive)

* Now Mikado pick will use lightweight SQLite databases for inteprocess data exchange (#218). It could still be improved by allowing to remove more fragments. * Small corrections for the `daijin` pipelines. * Fix #215 * Fixing the recovery for lost loci. * Amend for #134. Now min_cds_overlap has been reduced by default to 50% (75% was probably too restrictive) * Solved a bug in `mikado compare` that led to incorrect statistics when using multiprocessing. * Needed bug fixes for Mikado serialise. * Mikado configure was embedding the scoring file within the configuration - now amended. * Fix #217

…lso always uses the primary transcript CDS length as the denominator (not the minimum between the two compared transcripts).

…sis of the shortest CDS in the pair) for EI-CoreBioinformatics#134

… a bit the unittest coverage and documentation.

* Now Mikado pick will use lightweight SQLite databases for inteprocess data exchange (EI-CoreBioinformatics#218). It could still be improved by allowing to remove more fragments. * Small corrections for the `daijin` pipelines. * Fix EI-CoreBioinformatics#215 * Fixing the recovery for lost loci. * Amend for EI-CoreBioinformatics#134. Now min_cds_overlap has been reduced by default to 50% (75% was probably too restrictive) * Solved a bug in `mikado compare` that led to incorrect statistics when using multiprocessing. * Needed bug fixes for Mikado serialise. * Mikado configure was embedding the scoring file within the configuration - now amended. * Fix EI-CoreBioinformatics#217

lucventurini assigned lucventurini and gemygk Oct 12, 2018

lucventurini added the enhancement label Oct 12, 2018

lucventurini added this to the 1.3 milestone Oct 12, 2018

lucventurini pushed a commit that referenced this issue Oct 12, 2018

BROKEN. Starting to work on #134

e999def

lucventurini added a commit that referenced this issue Oct 12, 2018

Implemented #134, unit tests passed

74952e2

lucventurini added a commit that referenced this issue Oct 15, 2018

Re-fixed #134. Now Mikado durign the ASE check also always uses the p…

4f23d41

…rimary transcript CDS length as the denominator (not the minimum between the two compared transcripts).

lucventurini added a commit that referenced this issue Oct 15, 2018

Reversed to the old default (calculating the CDS percentage on the ba…

5cf8a17

…sis of the shortest CDS in the pair) for #134

lucventurini added a commit that referenced this issue Oct 16, 2018

#134 should be definitely fixed. I also improved a bit the unittest c…

b079750

…overage and documentation.

lucventurini closed this as completed Oct 16, 2018

lucventurini added a commit to lucventurini/mikado that referenced this issue Sep 20, 2019

Amend for EI-CoreBioinformatics#134. Now min_cds_overlap has been red…

b88f0d2

…uced by default to 50% (again, 75% was probably too restrictive)

lucventurini pushed a commit to lucventurini/mikado that referenced this issue Feb 11, 2021

BROKEN. Starting to work on EI-CoreBioinformatics#134

988b6c8

lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021

Implemented EI-CoreBioinformatics#134, unit tests passed

5c459b3

lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021

Reversed to the old default (calculating the CDS percentage on the ba…

72f0bcb

…sis of the shortest CDS in the pair) for EI-CoreBioinformatics#134

lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021

EI-CoreBioinformatics#134 should be definitely fixed. I also improved…

3ba4c6b

… a bit the unittest coverage and documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Check that coding ASEs share the frame with the primary transcript #134

[Improvement] Check that coding ASEs share the frame with the primary transcript #134

lucventurini commented Oct 12, 2018

lucventurini commented Oct 15, 2018

lucventurini commented Oct 16, 2018

lucventurini commented Oct 16, 2018

[Improvement] Check that coding ASEs share the frame with the primary transcript #134

[Improvement] Check that coding ASEs share the frame with the primary transcript #134

Comments

lucventurini commented Oct 12, 2018

lucventurini commented Oct 15, 2018

lucventurini commented Oct 16, 2018

lucventurini commented Oct 16, 2018