This repository contains the code for training and running SPLIRENT, a deep neural network that can predict alternative 5' splice donor usage given the DNA sequence as input, with cell type-specific predictions for HEK, HELA, MCF7 and CHO.
SPLIRENT was trained on the randomized splicing MPRA described in Rosenberg et al, Cell 2015. The MPRA was replicated in cell lines HELA, MCF7 and CHO and merged with the original HEK-measured dataset.
This readme contain links to IPython Notebooks containing analyses and model evaluations. There is also a link to the repository containing all of the processed data used in the notebooks.
Contact jlinder2 (at) cs.washington.edu for any questions about the model or data.
SPLIRENT can be installed by cloning or forking the github repository:
git clone https://github.com/johli/splirent.git
cd splirent
python setup.py install
- Python >= 3.6
- Tensorflow >= 1.13.1
- Keras >= 2.2.4
- Scipy >= 1.2.1
- Numpy >= 1.16.2
- Isolearn >= 0.2.0 (github)
- [Optional] Pandas >= 0.24.2
- [Optional] Matplotlib >= 3.1.1
SPLIRENT is built as a Keras Model, and as such can be easily executed using simple Keras function calls. See the example usage notebooks below for a tutorial on how to use the model.
The processed 5' Alternative Splicing MPRA data is stored in the repository below. The dataset is stored as a .tar.gz archive at the following path:
alt_5ss/unprocessed_data/alt_5ss_multi_cell_line_data.tar.gz
The base version of the dataset, containing a .csv file of splice variant sequences and a .mat file of corresponding cell type-specific RNA-Seq read counts supporting 5' splicing at each nucleotide position.
The following collection of IPython Notebooks contains model analyses (cell type-specific prediction accuracies, differential splicing, etc.).
Evaluation of the Linear Logistic Hexamer Regression trained on the Random MPRA library.
Notebook 1a: Logistic Regression Training
Notebook 1b: Logistic Regression Evaluation
Notebook 2a: Logistic Regression Training (both random regions)
Notebook 2b: Logistic Regression Evaluation (both random regions)
Evaluation of the SPLIRENT CNN trained on the Random MPRA library.
Notebook 3a: Neural Network Evaluation (isoforms)
Notebook 3b: Neural Network Evaluation (nt-resolution)