MIST: an interpretable and flexible deep learning framework for single-T cell transcriptome and receptor analysis

install the latest develop version
pip install git+https://github.com/aapupu/MIST.git
or git clone and install
git clone git://github.com/aapupu/MIST.git
cd MIST
pip install -e .
pip install mist-vae
Note: Python 3.8 and scirpy 0.12.0 are recommended. MIST is implemented in Pytorch framework. If cuda is available, GPU modes will be run automatically.
from mist import MIST
adata, model = MIST(rna_path, tcr_path, batch, rna_data_type, tcr_data_type, type)
Parameters of API function are similar to command line options.
The output includes a trained model and an Anndata object, which can be further analyzed using scanpy and scirpy.
rna_path
List of paths to scRNA-seq data files.
tcr_path
List of paths to scTCR-seq data files.
batch
List of batch labels.
rna_data_type
Type of scRNA-seq data file (e.g., 'h5ad').
tcr_data_type
Type of scTCR-seq data file (e.g., '10X').
type
Type of model to train ('joint', 'rna', or 'tcr').
MIST --rna_path rna_path1 rna_path2 --tcr_path tcr_path1 tcr_path2 --batch batch1 batch2 --rna_data_type h5ad --tcr_data_type 10X --type joint
- adata.h5ad: preprocessed data and results
- model.pt: saved model
- --rna_path
Paths to scRNA-seq data files. (example: XXX1.h5ad XXX2.h5ad) - --tcr_path
Paths to scTCR-seq data files. (example: XXX1.csv XXX2.csv) - --batch
Batch labels. - --rna_data_type
Type of scRNA-seq data file (e.g., 10X mtx, h5, or h5ad). Default: h5ad - --tcr_data_type
Type of scTCR-seq data file (e.g., 10X, tracer, BD, or h5ad). Default: 10X - --protein_path
Path to merged protein (ADT) data file. - --type
Type of model to train (e.g., joint, rna, or tcr). Default: joint - --min_genes
Filtered out cells that are detected in less than min_genes. Default: 600 - --min_cells
Filtered out genes that are detected in less than min_cells. Default: 3 - --pct_mt
Filtered out cells that are detected in more than percentage of mitochondrial genes. If None, Filtered out mitochondrial genes. Default: None - --n_top_genes
Number of highly-variable genes to keep. Default: 2000 - --batch_size
Batch size for training. Default: 128 - --pooling_dims
Dimensionality of pooling layer. Default: 16 - --z_dims
Dimensionality of latent space. If type='rna', z_dims=pooling_dims. Default: 128 - --drop_prob
Dropout probability. Default: 0.1 - --lr
Learning rate for the optimizer. Default: 1e-4 - --weight_decay
L2 regularization strength. Default: 1e-3 - --max_epoch
Maximum number of epochs. Default: 300 - --patience
Patience for early stopping. Default: 30 - --warmup
Warmup epochs. Default: 30 - --gpu
Index of GPU to use if GPU is available. Default: 0 - --seed
Random seed. Default: 42 - --outdir
Output directory.
Explore further applications of MIST
MIST.py --help
The running examples of MIST can be found in the jupyter folder.
MIST: an interpretable and flexible deep learning framework for single-T cell transcriptome and receptor analysis
Wenpu Lai, Yangqiu Li, Oscar Junhong Luo
bioRxiv 2024.07.05.602192; doi: https://doi.org/10.1101/2024.07.05.602192