scCompReg (Single-Cell Comparative Regulatory analysis) is an R package that provides coupled clustering and joint embedding of scRNA-seq and scATAC-seq on one sample, and performs comparative gene regulatory analysis between two conditions.
Please check the man page via ?function
(for example, ?sc_compreg
) for a detailed description of the types of inputs and outputs.
- Operating System: Linux or MacOS
- R (>= 3.6.0)
- Bedtools (Linux)
- Homer (Linux)
- scCompReg first release.
Use the following command to install scCompReg R package from source code:
require(devtools)
devtools::install_github("SUwonglab/sc-compReg", ref="master", subdir="R_package")
For a full example of using the scCompReg method, please refer to example.R
. The necessary data have been uploaded to the data
folder in this repository.
To download the data, make sure you have git lfs
installed. Installation instructions can be found here: https://github.com/git-lfs/git-lfs/wiki/Installation
Next, run the following line in shell:
git lfs clone https://github.com/SUwonglab/sc-compReg.git
The downloaded data directory will be in sc-compReg/data/
. Simply set in R
path = './example_data/'
prior_data_path = './prior_data/'
To run scCompReg, run the following lines in R:
library(scCompReg)
sample1 = readRDS(paste(path, 'sample1.rds', sep = ''))
sample2 = readRDS(paste(path, 'sample2.rds', sep = ''))
peak.name.intersect.dir = paste(path, 'PeakName_intersect.txt', sep='')
peak.gene.prior.dir = paste(path, 'peak_gene_prior_intersect.bed', sep='')
motif = readRDS(paste(prior_data_path, 'motif_human.rds', sep=''))
motif.file = readRDS(paste(path, 'motif_file.rds', sep=''))
compreg.output = sc_compreg(sample1$O1,
sample1$E1,
sample1$O1.idx,
sample1$E1.idx,
sample1$symbol1,
sample1$peak.name1,
sample2$O2,
sample2$E2,
sample2$O2.idx,
sample2$E2.idx,
sample2$symbol2,
sample2$peak.name2,
motif$motif.name,
motif$motif.weight,
motif$match2,
motif.file,
peak.name.intersect.dir,
peak.gene.prior.dir,
sep.char=' ')
To save the obtained output, run the lines below in R:
for (i in 1:compreg.output$n.pops) {
write.table(compreg.output$hub.tf[[i]],
paste(path, 'tf_', i, '.txt', sep=''),
row.names = F,
quote = F,
sep = '\t')
write.table(compreg.output$diff.net[[i]],
paste(path, 'diff_net_', i, '.txt', sep=''),
row.names = F,
quote = F,
sep = '\t')
}
The entire scCompReg workflow consists of three mandatory steps and one optional step.
-
Download the
prior_data
directory from github viagit clone git@github.com:SUwonglab/sc-compReg.git
. -
Optional: obtaining cluster assignments from coupled nonnegative matrix factorization.
-
Preproces data for
cnmf
:-
Obtain
peak.bed
file -
In
sc-compReg/preprocess_data/
, run the following script:bash cnmf_process_data.sh path/to/peak.bed genome_version path/to/prior_data
where
genome_version
is one of {hg19
,hg38
,mm9
,mm10
}, andprior_data
is a folder downloaded in step 1. -
Output:
- peak_gene_coupling_matrix.txt
-
After loading
peak.name
andsymbol
, run the following script in R to convertpeak_gene_coupling_matrix.txt
toD
, the coupling matrix, using the following code in R:
D <- cnmf_load_coupling_matrix('peak_gene_coupling_matrix.txt'), peak.name, symbol)
-
-
Run
cnmf
to get the cluster labels for sample 1 and sample 2. The cluster labels should be passed tosc_compreg
asO1.idx
,E1.idx
,O2.idx
, andE2.idx
. For an example on how to runcnmf
, please refer tocnmf_example.R
-
Note: It is not required to obtain cluster assignments using the coupled nonnegative matrix factorization workflow. The necessary input to
scCompReg
is some consistent cluster assignments in scRNA-seq and scATAC-seq.
-
-
Process data for
scCompReg
-
Obtain
peak_name1.txt
andpeak_name2.txt
files containing the peak names of sample 1 and sample 2, respectively in bed format (chr \t start \t end but ignoring the spaces in the previous text) -
In
sc-compReg/preprocess_data/
, run the following script:bash sc_compreg_process_data_.sh path/to/peak_name1.txt path/to/peak_name2.txt genome_version path/to/prior_data
where
genome_version
is one of {hg19
,hg38
,mm9
,mm10
}, andprior_data
is a folder downloaded in step 1. -
Output:
- PeakName_intersect.txt
- peak_gene_prior_intersect.bed
- MotifTarget.txt
-
-
Follow the tutorial on the
sc_compreg
function.- The necessary inputs to
sc_compreg
are- consistent cluster assignments in scRNA-seq and scATAC-seq (can be obtained from coupled nonnegative matrix factorization or obtained elsewhere)
- log2-transformed gene expression matrices of samples 1 and 2
- log2-transformed chromatin accessibility matrices of samples 1 and 2
- symbol names of samples 1 and 2
- Input
peak.name.intersect.dir
is the path to thePeakName_intersect.txt
file generated in step 3. - Input
peak.gene.prior.dir
is the path to thepeak_gene_prior_intersect.bed
file generated in step 3. - Load the corresponding motif file for human in R via
or for mouse,
motif = readRDS('prior_data/motif_human.rds')
motif = readRDS('prior_data/motif_mouse.rds')
- Load
motif.file
in R viawheremotif.file = mfbs_load(motif.target.dir)
motif.target.dir
is the path to theMotifTarget.txt
file generated in step 3.
- The necessary inputs to
scCompReg provides access to the following functions:
Command | Description |
---|---|
sc_compreg | Performs single-cell comparative regulatory analysis based on scRNA-seq and scATAC-seq data from two different conditions. |
mfbs_load | Efficiently loads the motif_target file and returns an R list of the loaded objects. |
[1] Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data Zhana Duren, Wenhui Sophia Lu, Joseph G. Arthur, Preyas Shah, Jingxue Xin, Francesca Meschi, Miranda Lin Li, Corey M. Nemec, Yifeng Yin, and Wing Hung Wong