Skip to content

Commit

Permalink
Merge pull request #103 from lucventurini/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
Luca Venturini authored Mar 7, 2017
2 parents 43eaf55 + 4d7442b commit d039f12
Show file tree
Hide file tree
Showing 32 changed files with 1,375 additions and 588 deletions.
28 changes: 22 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,30 @@
#Version 1.0

Changes in this release:

- **MAJOR**: solved a bug which caused a failure of clustering into loci in rare occasions. Through the graph clustering, now Mikado is guaranteed to group monoloci correctly.
- **MAJOR**: When looking for fragments, now Mikado will consider transcripts without a strand as being on the **opposite** strand of neighbouring transcripts. This prevents many monoexonic, non-coding fragments from being retained in the final output.
- **MAJOR**: now Mikado serialise also stores the ***frame*** information of transcripts. Hits on the opposite strand will be **ignored**. This requires to **regenerate all Mikado databases**.
- **MAJOR**: Added the final configuration files used for the article.
- Added three new metrics, "blast_target_coverage", "blast_query_coverage", "blast_identity"
- Changed the *default* repertoire of valid AS events to J, j, G, h (removed C and g).
- **Bug fix**: now Mikado will consider the cDNA/CDS overlap also for monoexonic transcripts, even when the "simple_overlap_for_monoexonic_loci" flag is set to true.
- Solved some issues with the Daijin schemas, which prevented correct referencing.
- Bug fix for finding retained introns - Mikado was not accounting for cases where an exon started within an intron and crossed multiple subsequent junctions.
- BF: Loci will never purge transcripts
- After creating the final loci, now Mikado will check for, and remove, any AS event transcript which would cross into the AS event.

#Version 1.0.0beta10

Changes in this release:

- **MAJOR**: re-written the clustering algorithm for the MonosublocusHolder stage. Now a holder will accept another monosublocus if:
- the cDNA and CDS overlap is over a user-specified threshold *OR*
OR
- the cDNA and CDS overlap is over a user-specified threshold
*OR*
- there is some intronic overlap
OR
*OR*
- one intron of either transcript is completely contained within an exon of the other.
OR
*OR*
- at least one of the transcripts is monoexonic and there is some overlap of any kind. This behaviour (which was the default until this release) can be switched off through pick/clustering/simple_overlap_for_monoexonic (default true).
- **MAJOR**: changed slightly the anatomy of the configuration files. Now "pick" has two new subsections, "clustering" and "fragments".
- Clustering: dedicated to how to cluster the transcripts in the different steps. Currently it contains the keys:
Expand All @@ -20,8 +36,8 @@ Changes in this release:
- Fragments: dedicated to how to identify and treat putative fragments. Currently it contains the keys:
- "remove": whether to exclude fragments, previously under "run_options"
- "valid_class_codes": which class codes constitute a fragment match. Only class codes in the "Intronic", "Overlap" (inclusive of _) and "Fragment" categories are allowed.
- max_distance: for non-overlapping fragments (ie p and P), maximum distance from the gene.
- Solved a long-standing bug which caused Mikado compare to consider as fusion also hits.
- max_distance: for non-overlapping fragments (ie p and P), maximum distance from the gene.
- Solved a long-standing bug which caused Mikado compare to consider as fusion also hits on the opposite strand only.
- Mikado compare now also provides the location of the matches in TMAP and REFMAP files.
- Introduced a new utility, "class_codes", to print out the information of the class codes. The definition of class codes is now contained in a subpackage of "scales".
- The "metrics" utility now allows for interactive querying based on category or metric name.
Expand Down
2 changes: 1 addition & 1 deletion Mikado/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
__author__ = 'Luca Venturini'
__license__ = 'GPL3'
__copyright__ = 'Copyright 2015-2016 Luca Venturini'
__version__ = "1.0.0b10"
__version__ = "1.0"

__all__ = ["configuration",
"exceptions",
Expand Down
6 changes: 3 additions & 3 deletions Mikado/configuration/configuration_blueprint.json
Original file line number Diff line number Diff line change
Expand Up @@ -352,9 +352,7 @@
"default": [
"j",
"J",
"C",
"G",
"g",
"h"
]
},
Expand Down Expand Up @@ -731,7 +729,9 @@
"X",
"i",
"m",
"_"
"_",
"e",
"o"
]
}
}
Expand Down
2 changes: 2 additions & 0 deletions Mikado/configuration/daijin_configurator.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ def create_daijin_config(args, level="ERROR"):

if args.flank is not None:
config["mikado"]["pick"]["clustering"]["flank"] = args.flank
if args.intron_range is not None:
config["mikado"]["pick"]["run_options"]["intron_range"] = args.intron_range

config["blastx"]["prot_db"] = args.prot_db
assert "prot_db" in config["blastx"]
Expand Down
26 changes: 23 additions & 3 deletions Mikado/configuration/daijin_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -277,11 +277,17 @@
"scoring_file": {"$ref": "configuration_blueprint.json#properties/pick/properties/scoring_file"},
"alternative_splicing": {
"$ref": "configuration_blueprint.json#properties/pick/properties/alternative_splicing"},
"clustering": {
"$ref": "configuration_blueprint.json#properties/pick/properties/clustering"},
"fragments": {
"$ref": "configuration_blueprint.json#properties/pick/properties/fragments"
},
"run_options": {
"type": "object",
"properties": {
"flank": {
"$ref": "configuration_blueprint.json#properties/pick/properties/clustering/properties/flank"}
"intron_range": {
"$ref": "configuration_blueprint.json#properties/pick/properties/run_options/properties/intron_range"
}
}
}
}
Expand Down Expand Up @@ -310,6 +316,20 @@
}
}
},
"tgg_max_mem": {"type": "integer", "default": 6000, "minimum": 1000, "required": true}
"tgg": {
"type": "object",
"SimpleComment": ["Options related to genome-guided Trinity."],
"Comment": ["Options related to genome-guided Trinity.",
"- max_mem: Maximum memory to be used for the assembly. Default: 6000Mb",
"- npaths: number of alignments per sequence, using GMAP. Default: 0 (one alignment per sequence, exclude chimeric).",
"- identity: minimum identity for any alignment. Default: 95%",
"- coverage: minimum coverage for any alignment. Default: 70%"],
"properties": {
"max_mem": {"type": "integer", "default": 6000, "minimum": 1000, "required": true},
"npaths": {"type": "integer", "default": 0},
"identity": {"type": "number", "default": 0.95, "minimum": 0, "maximum": 1},
"coverage": {"type": "number", "default": 0.70, "minimum": 0, "maximum": 1}
}
}
}
}
83 changes: 83 additions & 0 deletions Mikado/configuration/scoring_files/athaliana_scoring.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
requirements:
expression:
- ((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)
- and
- max_intron_length and min_intron_length and proportion_verified_introns_inlocus)
- or
- (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))
parameters:
snowy_blast_score: {operator: gt, value: 0} # 0.2
selected_cds_length.mono: {operator: gt, value: 300} # 600
selected_cds_length.zero: {operator: gt, value: 0}
cdna_length.multi: {operator: ge, value: 400}
selected_cds_length.multi: {operator: gt, value: 200}
exon_num.mono: {operator: eq, value: 1}
exon_num.multi: {operator: gt, value: 1}
max_intron_length: {operator: le, value: 150000}
min_intron_length: {operator: ge, value: 20}
proportion_verified_introns_inlocus: {operator: gt, value: 0}
as_requirements:
expression: [cdna_length and three_utr_length and five_utr_length and utr_length and suspicious_splicing]
parameters:
cdna_length: {operator: ge, value: 200}
utr_length: {operator: le, value: 2500}
five_utr_length: {operator: le, value: 2500}
three_utr_length: {operator: le, value: 2500}
suspicious_splicing: {operator: ne, value: true}
not_fragmentary:
expression: [((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)), or, (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))]
parameters:
selected_cds_length.zero: {operator: gt, value: 300} # 600
exon_num.multi: {operator: gt, value: 2}
cdna_length.multi: {operator: ge, value: 300}
selected_cds_length.multi: {operator: gt, value: 250}
exon_num.mono: {operator: eq, value: 1}
snowy_blast_score: {operator: gt, value: 0} # 0.3
selected_cds_length.mono: {operator: gt, value: 600} # 900
exon_num.mono: {operator: le, value: 2}
scoring:
# blast_score: {rescaling: max}
snowy_blast_score: {rescaling: max}
cdna_length: {rescaling: max}
cds_not_maximal: {rescaling: min}
cds_not_maximal_fraction: {rescaling: min}
# exon_fraction: {rescaling: max}
exon_num: {
rescaling: max,
filter: {
operator: ge,
value: 3}
}
five_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 100
five_utr_num:
filter: {operator: lt, value: 4}
rescaling: target
value: 2
end_distance_from_junction:
filter: {operator: lt, value: 55}
rescaling: min
highest_cds_exon_number: {rescaling: max}
intron_fraction: {rescaling: max}
is_complete: {rescaling: target, value: true}
number_internal_orfs: {rescaling: target, value: 1}
# proportion_verified_introns: {rescaling: max}
non_verified_introns_num: {rescaling: min}
proportion_verified_introns_inlocus: {rescaling: max}
retained_fraction: {rescaling: min}
retained_intron_num: {rescaling: min}
selected_cds_fraction: {rescaling: target, value: 0.8}
selected_cds_intron_fraction: {rescaling: max}
selected_cds_length: {rescaling: max}
selected_cds_num: {rescaling: max}
three_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 200
three_utr_num:
filter: {operator: lt, value: 3}
rescaling: target
value: 1
combined_cds_locus_fraction: {rescaling: max}
83 changes: 83 additions & 0 deletions Mikado/configuration/scoring_files/celegans_scoring.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
requirements:
expression:
- ((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)
- and
- max_intron_length and min_intron_length and proportion_verified_introns_inlocus)
- or
- (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))
parameters:
snowy_blast_score: {operator: gt, value: 0} # 0.2
selected_cds_length.mono: {operator: gt, value: 300} # 600
selected_cds_length.zero: {operator: gt, value: 0}
cdna_length.multi: {operator: ge, value: 400}
selected_cds_length.multi: {operator: gt, value: 200}
exon_num.mono: {operator: eq, value: 1}
exon_num.multi: {operator: gt, value: 1}
max_intron_length: {operator: le, value: 150000}
min_intron_length: {operator: ge, value: 20}
proportion_verified_introns_inlocus: {operator: gt, value: 0}
as_requirements:
expression: [cdna_length and three_utr_length and five_utr_length and utr_length and suspicious_splicing]
parameters:
cdna_length: {operator: ge, value: 200}
utr_length: {operator: le, value: 2500}
five_utr_length: {operator: le, value: 2500}
three_utr_length: {operator: le, value: 2500}
suspicious_splicing: {operator: ne, value: true}
not_fragmentary:
expression: [((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)), or, (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))]
parameters:
selected_cds_length.zero: {operator: gt, value: 300} # 600
exon_num.multi: {operator: gt, value: 2}
cdna_length.multi: {operator: ge, value: 300}
selected_cds_length.multi: {operator: gt, value: 250}
exon_num.mono: {operator: eq, value: 1}
snowy_blast_score: {operator: gt, value: 0} # 0.3
selected_cds_length.mono: {operator: gt, value: 600} # 900
exon_num.mono: {operator: le, value: 2}
scoring:
snowy_blast_score: {rescaling: max}
cdna_length: {rescaling: max}
cds_not_maximal: {rescaling: min}
cds_not_maximal_fraction: {rescaling: min}
# exon_fraction: {rescaling: max}
exon_num: {
rescaling: max,
filter: {
operator: ge,
value: 3}
}
five_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 100
five_utr_num:
filter: {operator: lt, value: 4}
rescaling: target
value: 2
end_distance_from_junction:
filter: {operator: lt, value: 55}
rescaling: min
highest_cds_exon_number: {rescaling: max}
intron_fraction: {rescaling: max}
is_complete: {rescaling: target, value: true}
# num_introns_smaller_than_min: {rescaling: target, value: 0}
# num_introns_greater_than_max: {rescaling: target, value: 0}
number_internal_orfs: {rescaling: target, value: 1}
# proportion_verified_introns: {rescaling: max}
proportion_verified_introns_inlocus: {rescaling: max}
retained_fraction: {rescaling: min}
retained_intron_num: {rescaling: min}
selected_cds_fraction: {rescaling: target, value: 0.85}
selected_cds_intron_fraction: {rescaling: max}
selected_cds_length: {rescaling: max}
selected_cds_num: {rescaling: max}
three_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 200
three_utr_num:
filter: {operator: lt, value: 3}
rescaling: target
value: 1
combined_cds_locus_fraction: {rescaling: max}
82 changes: 82 additions & 0 deletions Mikado/configuration/scoring_files/dmelanogaster_scoring.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
requirements:
expression:
- ((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)
- and
- max_intron_length and min_intron_length and proportion_verified_introns_inlocus)
- or
- (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))
parameters:
snowy_blast_score: {operator: gt, value: 0} # 0.2
selected_cds_length.mono: {operator: gt, value: 300} # 600
selected_cds_length.zero: {operator: gt, value: 0}
cdna_length.multi: {operator: ge, value: 400}
selected_cds_length.multi: {operator: gt, value: 200}
exon_num.mono: {operator: eq, value: 1}
exon_num.multi: {operator: gt, value: 1}
max_intron_length: {operator: le, value: 150000}
min_intron_length: {operator: ge, value: 20}
proportion_verified_introns_inlocus: {operator: gt, value: 0}
as_requirements:
expression: [cdna_length and three_utr_length and five_utr_length and utr_length and suspicious_splicing]
parameters:
cdna_length: {operator: ge, value: 200}
utr_length: {operator: le, value: 2500}
five_utr_length: {operator: le, value: 2500}
three_utr_length: {operator: le, value: 2500}
suspicious_splicing: {operator: ne, value: true}
not_fragmentary:
expression: [((exon_num.multi and (cdna_length.multi or selected_cds_length.multi)), or, (exon_num.mono and ((snowy_blast_score and selected_cds_length.zero) or selected_cds_length.mono)))]
parameters:
selected_cds_length.zero: {operator: gt, value: 300} # 600
exon_num.multi: {operator: gt, value: 2}
cdna_length.multi: {operator: ge, value: 300}
selected_cds_length.multi: {operator: gt, value: 250}
exon_num.mono: {operator: eq, value: 1}
snowy_blast_score: {operator: gt, value: 0} # 0.3
selected_cds_length.mono: {operator: gt, value: 600} # 900
exon_num.mono: {operator: le, value: 2}
scoring:
snowy_blast_score: {rescaling: max}
cdna_length: {rescaling: max}
cds_not_maximal: {rescaling: min}
cds_not_maximal_fraction: {rescaling: min}
# exon_fraction: {rescaling: max}
exon_num: {
rescaling: max,
filter: {
operator: ge,
value: 3}
}
five_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 100
five_utr_num:
filter: {operator: lt, value: 4}
rescaling: target
value: 2
end_distance_from_junction:
filter: {operator: lt, value: 55}
rescaling: min
highest_cds_exon_number: {rescaling: max}
intron_fraction: {rescaling: max}
is_complete: {rescaling: target, value: true}
number_internal_orfs: {rescaling: target, value: 1}
# proportion_verified_introns: {rescaling: max}
non_verified_introns_num: {rescaling: min}
proportion_verified_introns_inlocus: {rescaling: max}
retained_fraction: {rescaling: min}
retained_intron_num: {rescaling: min}
selected_cds_fraction: {rescaling: target, value: 0.8}
selected_cds_intron_fraction: {rescaling: max}
selected_cds_length: {rescaling: max}
selected_cds_num: {rescaling: max}
three_utr_length:
filter: {operator: le, value: 2500}
rescaling: target
value: 200
three_utr_num:
filter: {operator: lt, value: 3}
rescaling: target
value: 1
combined_cds_locus_fraction: {rescaling: max}
Loading

0 comments on commit d039f12

Please sign in to comment.