MTLRank

Introduction

MTLRank is a multi-task learning based framework for inferring regulatory interactions from single cell data. MTLRank ranks TFs for each gene by learning models that predict RNA velocity values of target genes. Models are learned in a multi-task, soft-parameter-sharing based manner to improve the performance. MTLRank uses TF expression matrix and TF activity matrix as inputs and ranks the TFs with deep SHAP based on the trained models. This repository shows the Jupyter notebooks, and other script files used to test different models used in the study.

How to use

Clone the Repository

git clone https://github.com/alexQiSong/MTLRank.git

Environment Set up

Install singularity in your OS. We use singularity image to run the search of potential target genes by ChIPseeker. Install singularity here.
All required dependencies are specified in the file MTLRANK_env.yml in this repository. Download this file and use conda to recreate this environment by

conda env create -f MTLRank_env.yml

Run pipeline

Activate the MTLRank environment by

conda activate MTLRank

Download sample data. We provide a script for downloading sample data sets of spleen and liver tissue from HuBMAP data portal. Due to data sharing policy, we are unable to directly share the data from HuBMAP consortium. Some of the data sets we used in the paper are not yet published. So the sample data sets might produce results different than what have been presented in the paper. To download the sample data sets, simply run download.py in this repository by (make sure MTLRank environment is active)

python download.py

This will automatically download most of the required data.

Download data from Cistrome database. Due to data sharing policy, we are not allowed to directly share ChIP-seq data from CistromeDB. Please go to http://cistrome.org/db/#/bdown and select Human_Factor to download all bed files and a QC file for ChIP-seq data. All bed files are downloaded as a gz file named "human_factor.tar.gz". Download and decompress this file and move all bed files to data/chipseq_bed/. QC file is named as "human_factor_full_QC.txt". Download this file and move it to data/chipseq_qc/
Run preprocessing steps. Simply run preprocess.py to perform all preprocessing steps including generating RPKM values and TF activity score computation. You may use n_jobs to perform parallel computation. This may take some time (~ 1hr when running with --n_jobs 30)

python preprocess.py --n_jobs 30

Run MTLRank pipeline using the jupyter notebookes MTLRank_step1_train.ipynb, MTLRank_step2_rankTF.ipynb, and MTLRank_step3_GRN.ipynb by following the order specified by the names. We provide more detailed instructions in each notebook, including the the suggested settings of hyperparameters.

Contact

Contact us if you have any questions:
Qi (Alex) Song: qisong@andrew.cmu.edu; sqsq3178@gmail.com
Ziv Bar-Joseph: zivbj@andrew.cmu.edu

Cite

Comming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
img		img
.gitignore		.gitignore
MTLRANK_env.yml		MTLRANK_env.yml
MTLRank_step1_train.ipynb		MTLRank_step1_train.ipynb
MTLRank_step2_rankTF.ipynb		MTLRank_step2_rankTF.ipynb
MTLRank_step3_GRN.ipynb		MTLRank_step3_GRN.ipynb
README.md		README.md
annotate_peaks.R		annotate_peaks.R
download.py		download.py
preprocess.py		preprocess.py
preprocess_expressions.py		preprocess_expressions.py
preprocess_get_tf_activities.py		preprocess_get_tf_activities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTLRank

Introduction

How to use

Clone the Repository

Environment Set up

Run pipeline

Contact

Cite

Copyright

About

Releases 1

Packages

Languages

alexQiSong/MTLRank

Folders and files

Latest commit

History

Repository files navigation

MTLRank

Introduction

How to use

Clone the Repository

Environment Set up

Run pipeline

Contact

Cite

Copyright

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages