This MetaTiMEpretrain repo, when ran on a large set of scRNA-seq samples, can generate gene programs corresponding to cell types, cell states, or signaling pathways in the provided data. Meta-components called from scRNA-seq represent independent transcriptional variations, reproducibly seen in data, in format as weighted gene contribution vectors. For now the low-dim method is independent component analysis. Check MetaTiME paper or MetaTiME annotator repo for the scenario of Tumor Microenvironment. [in progress :)]
- scanpy, pandas, multiprocessing, sklearn, seaborn, scikitnetwork >=0.28.2
git clone https://github.com/yi-zhang/MetaTiMEpretrain.git
cd MetaTiMEpretrain
- Collect your scRNA datasets in as h5ad format. Or, a few tumor scRNA datasets are provided in this sample input link. Download folder and cp to
MetaTiMEpretrain/test/
, or point to it as inputdatadir
inscpp.py
. sh test.sh
# This is a script containing a few sequential steps, explained below.- An output folder for this sample input data is available from result dir for testrun.
Each steps can be ran separately.
python metatimetrain/scpp.py --datadir ./test/testdata/ --listfile ./test/testdata/datalst1.tsv --outdir ./test/analysis/pp
--datadir
: Input directory with scRNA files.--outdir
: output directory with preprocessed scRNA files. Counts will be depth normalized and log transformed.- Ran in parallel. Can take longer time (~hr) and big memory depends on data size.
python metatimetrain/decompose.py -d ./test/analysis/pp/ -t 4 -o ./test/analysis/decompose/ -k 100
-d
: Input directory with preprocessed scRNA files.-t
: Number of threads.-o
: Output directory of per-dataset decomposition table.-k
: Number of low-dim components.- Ran in parallel. Can take longer time (~hr) and big memory depends on data size.
python metatimetrain/aligncomp.py -c ./test/analysis/decompose -o ./test/analysis/decompose_align/ -k 100
-c
: Input directory with per-dataset decomposition table.-o
: Output directory of aligned decomposition tables.-k
: Number of low-dim components. Must be the same as in input decomposition tables.
python metatimetrain/pullcomp.py -c ./test/analysis/decompose_align/ -o ./test/analysis/decompose_pull/
-c
: Input directory with aligned, per-dataset decomposition table.-o
: Output directory to store a table gathering all low-dim components
python metatimetrain/callmec.py -c ./test/analysis/decompose_pull/ -o ./test/analysis/MeC/ -u True -s 2
-c
: Input directory with pulled components.-u
: Unit test or not.True
for test datasets-s
: Minimum number of components per meta-component cluster.
python [mergecohort.py](http://mergecohort.py/) --datadir ../mmmetatime202306/data --metafile ../mmmetatime202307cohort/breast_final_downloaded_metatable_gsm.csv --outdir ../mmmetatime202307cohort/gsedata -t 8