Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General enhancements #88

Closed
9 of 18 tasks
eboileau opened this issue Feb 5, 2018 · 0 comments
Closed
9 of 18 tasks

General enhancements #88

eboileau opened this issue Feb 5, 2018 · 0 comments

Comments

@eboileau
Copy link
Contributor

eboileau commented Feb 5, 2018

A list of minor/low-priority issues labeled as "enhancement". We need to re-visit this list.

Documentation

  • We need to brush up the documentation. Maybe it would be a good idea in the long term to switch to RST.

I'm working on this now.

Installation

  • Create a separate folder from the models from different versions of python: there seems to be some issues unpickling Stan models using different versions of python. So it would make sense to make the model path something like:
    $BASE/rpbp_models/python-<version>/...

Moved to CmdStanPy, no pickling anymore, models are installed/compiled under the conda environment by default.

  • Add setup option to force recompilation of Stan models: by default, if the stan pickle models already exist, they are not recompiled. This can sometimes cause a problem due to changing versions of pystan and backwards compatibility issues.

This is not entirely resolved, listed in #133

Visualisation

Reporting/downstream analyses done via Dash.

  • ORF visualization: add additional genome browser tracks such as:

    1. Coverage profile of RNA-seq data (bedgraph https://genome.ucsc.edu/goldenpath/help/bedgraph.html -> bigWig)
    2. Coverage profile of RiboSeq data (all & periodic)
    3. P-site profile

    Adding the bam files to IGV is not so helpful because they include the entire reads and are not shifted to account for P-site offsets. Brief online searching suggests the best approach is probably to first convert the P-site bed object to wiggle, then the wiggle to bigWig.

  • Replicate correlation plots: add correlation plots of RPMs (or some other normalised value) between replicates after corrected assignment on codon and maybe on nucleotide level (see replicate ORF profiles).

  • Handle all levels of "sample" specification in get-all-orf-peptide-matches:

    The script is hard-coded to work with "cell-types" from the config file. It would be nice if it also handled samples (riboseq_samples" key) and conditions ("riboseq_biological_replicates" key).

    1. Add a command line option to the script to specify the level
    2. Add a function to ribo_utils.py which returns a list of the appropriate names
    3. Use this function rather than the call to ribo_utils.get_riboseq_cell_type_samples
    4. Add a function to ribo_utils.py which returns the appropriate "peptide__analysis" dictionary
    5. Use that in the loop

    This will also entail finding the correct filename based on the level (e.g., "sample" filenames include lengths and offsets, while the others do not; the locations are different).

  • Create proteomics results plots: add notebooks and plots to the peptide report which show the proteomics results.

    1. Venn diagram of detected peptide sequences with given PEP threshold
    2. Add detected peptides overlap to proteomics-report
    3. Venn diagram of in silico digested proteins
    4. Add possible peptides to proteomics-report
    5. Match (identified) peptides to transcripts
    6. Add matched transcripts to proteomics-report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant