Roadmap

Overview:

An approach to looking for “low-hanging fruit” in sub-groups of genes that are segregated by “easy” variables. We assume that the gene groups have been identified by various molecular approaches (most typically differential expression in one or more tests), and that we will have concurrently run functional enrichment analysis

Inputs:

Organism: An accepted identifier of a model organism (this is not likely to be a successful approach for non-model organisms) Gene sets: One or more sub-groups of genes, with the control/comparison either comprised of the entire genome/transcriptome, or alternatively a group passed in and identified as such

Possible resources:

Ensembl, UCSC browser (specifically the table browser), ModMine (in the form of yeastMine, mouseMine, wormMine, etc), STRING, modEncode,

Note 1:

To start with, I expect most of this to happen manually, but ideally, we will be able to identify patterns and procedures that can be captured in automated workflows. Since we are going to be looking at statistical distributions of various features, some programmatic or statistical environment will be necessary (JMP, Python/numPy/sciPy/pandas, etc).

Note 2:

This will be assumed to be in addition to and independent of pathway/ontology analysis, which should, of course, also be carried out.

Steps to be carried out in the main approach:

For each gene set, retrieve one or more tables of statistical characterizations, including at least (but not necessarily limited to)

a. Transcript length

b. CDS length

c. Number of exons

d. 5’UTR length

e. 3’UTR length

f. Codon Adaptation Index (if available)

g. Known interacting partners (perhaps from String)

h. Known regulatory interactions (e.g., validated or putative TF binding sites) Visualize in an appropriate manner (box plots, violin plots, etc) the various sets, and Apply appropriate statistical tests on the equality/consistency of the values within each variable Generate a visual/text report that shows all tests, with highlighting of any significant differences

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
README.md		README.md
UGGS_argparse.py		UGGS_argparse.py
UGGS_mk1.py		UGGS_mk1.py
c_elegans_data_analysis.xlsx		c_elegans_data_analysis.xlsx
c_elegans_gene_data.csv		c_elegans_gene_data.csv
csv_dict.py		csv_dict.py
index_selection.py		index_selection.py
jesse_data.csv		jesse_data.csv
jesse_data.xlsx		jesse_data.xlsx
tsv_to_csv.py		tsv_to_csv.py
xlsx_to_csv.py		xlsx_to_csv.py
~$c_elegans_gene_data.xlsx		~$c_elegans_gene_data.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Roadmap

Overview:

Inputs:

Possible resources:

Note 1:

Note 2:

Steps to be carried out in the main approach:

About

Releases

Packages

Languages

mdibl/UGGS

Folders and files

Latest commit

History

Repository files navigation

Roadmap

Overview:

Inputs:

Possible resources:

Note 1:

Note 2:

Steps to be carried out in the main approach:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages