MTLRank is a multi-task learning based framework for inferring regulatory interactions from single cell data. MTLRank ranks TFs for each gene by learning models that predict RNA velocity values of target genes. Models are learned in a multi-task, soft-parameter-sharing based manner to improve the performance. MTLRank uses TF expression matrix and TF activity matrix as inputs and ranks the TFs with deep SHAP based on the trained models. This repository shows the Jupyter notebooks, and other script files used to test different models used in the study.
git clone https://github.com/alexQiSong/MTLRank.git
- Install singularity in your OS. We use singularity image to run the search of potential target genes by ChIPseeker. Install singularity here.
- All required dependencies are specified in the file
MTLRANK_env.yml
in this repository. Download this file and useconda
to recreate this environment by
conda env create -f MTLRank_env.yml
- Activate the MTLRank environment by
conda activate MTLRank
- Download sample data. We provide a script for downloading sample data sets of spleen and liver tissue from HuBMAP data portal. Due to data sharing policy, we are unable to directly share the data from HuBMAP consortium. Some of the data sets we used in the paper are not yet published. So the sample data sets might produce results different than what have been presented in the paper. To download the sample data sets, simply run
download.py
in this repository by (make sure MTLRank environment is active)
python download.py
This will automatically download most of the required data.
-
Download data from Cistrome database. Due to data sharing policy, we are not allowed to directly share ChIP-seq data from CistromeDB. Please go to http://cistrome.org/db/#/bdown and select Human_Factor to download all bed files and a QC file for ChIP-seq data. All bed files are downloaded as a gz file named "human_factor.tar.gz". Download and decompress this file and move all bed files to data/chipseq_bed/. QC file is named as "human_factor_full_QC.txt". Download this file and move it to data/chipseq_qc/
-
Run preprocessing steps. Simply run
preprocess.py
to perform all preprocessing steps including generating RPKM values and TF activity score computation. You may usen_jobs
to perform parallel computation. This may take some time (~ 1hr when running with--n_jobs 30
)
python preprocess.py --n_jobs 30
- Run MTLRank pipeline using the jupyter notebookes
MTLRank_step1_train.ipynb
,MTLRank_step2_rankTF.ipynb
, andMTLRank_step3_GRN.ipynb
by following the order specified by the names. We provide more detailed instructions in each notebook, including the the suggested settings of hyperparameters.
Contact us if you have any questions:
Qi (Alex) Song: qisong@andrew.cmu.edu; sqsq3178@gmail.com
Ziv Bar-Joseph: zivbj@andrew.cmu.edu
Comming soon!
©2022 Qi Song, Ziv Bar-Joseph. Systems Biology Group at Carnegie Mellon University. All rights reserved.