Skip to content

Single-cell perturbation effects prediction benchmark

License

Notifications You must be signed in to change notification settings

bm2-lab/scPerturBench

Repository files navigation

Single-cell perturbation benchmark (scPerturBench)

Introduction

Various computational methods have been developed to predict perturbation effects, but despite claims of promising performance, concerns about the true efficacy of these models continue to be raised, particularly when the models are evaluated across diverse unseen cellular contexts and unseen perturbations. To address this, a comprehensive benchmark was conducted for 21 single-cell perturbation response prediction methods, including methods concerning genetic and chemical perturbations; 29 datasets were used, and various evaluation metrics were applied to assess the generalizability of the methods to unseen cellular contexts and perturbations. Recommendations regarding the method limitations, method generalization and method selection were obtained. Finally, an applicable solution that leverages prior knowledge through cellular context embedding to improve the generalizability of models to new cellular contexts is presented.

Workflow

Workflow

Cellular context generalization scenario

In the cellular context generalization scenario, we evaluate the prediction of known perturbations in previously unobserved cellular contexts. Specifically, we assessed the accuracy of 10 published methods and the trainMean baseline model across 12 datasets using four evaluation metrics including MSE, PCC-delta, E-distance, and common DEGs. The cellular context generalization scenario can be further divided into two distinct test settings based on the partitioning of the training and test datasets: i.i.d (independent and identically distributed or in-distribution) and o.o.d (out-of-distribution) setting. i.i.d contained the script used in the i.i.d setting. o.o.d contained the script used in the o.o.d setting. calPerformance and Utils is the script for performance calculation and generic function。

Perturbation generalization scenario

In the perturbation generalization scenario, we assess the ability of models to predict the effects of previously unobserved perturbations within a specific cellular context. Depending on the type of perturbation, this scenario can be further divided into two categories: genetic perturbation effects prediction and chemical perturbation effects prediction. (1) Genetic perturbation effect prediction. (2) Chemical perturbation effect prediction. Genetic contained the script used in the genetic setting. Chemical contained the script used in the chemical setting. calPerformance and Utils is the script for performance calculation and generic function。

bioLord-emCell

We posit that improving generalization in the cellular context generalization scenario requires models to effectively capture the heterogeneity in perturbation responses across cellular contexts. This can be achieved in one of two main ways: (1) training on large-scale, diverse datasets to directly learn heterogeneity or (2) leveraging existing prior knowledge. Given the scarcity of large-scale cellular perturbation datasets, the second approach is more feasible.Therefore, we propose a generalizable and applicable framework to improve model generalizability across different cellular contexts via cell line embedding and disentanglement representation. bioLord-emCell contained the script we used to implement our framework.

We use sciplex3 dataset as a demo case to run biolord-emCell. We recommend using Anaconda / Miniconda to create a conda environment for using biolord-emCell. You can create a python environment using the following command:

conda env create -f environment.yml
python  biolord-emCell.py

sciplex3_cell_embs.pkl was obtained by Get_embedding.py
For more details, please refer to our manuscript and scGPT tutorial.

Benchmark datasets summary

All datasets analyzed in our study are listed in the Workflow. We have uploaded all benchmark datasets to Figshare and Zenodo, which can be obtained from Figshare-Cellular, Figshare-Perturbation, Zenodo-Cellular and Zenodo-perturbation.

Benchmark methods

All benchmark methods analyzed in our study are listed below. Details of the setting were available in our manuscript.

Method Article Time Title Version
biolord Nature Biotechnology 2024 Disentanglement of single-cell data with biolord 0.0.3
CellOT Nature Methods 2023 Learning single-cell perturbation responses using neural optimal transport 0.0.1
inVAE Bioengineering 2023 Homogeneous Space Construction and Projection for Single-Cell Expression Prediction Based on Deep Learning 0.0.1
scDisInFact Nature Communications 2024 scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data 0.1.0
scGen Nature Methods 2019 scGen predicts single-cell perturbation responses 2.1.0
scPRAM Bioinformatics 2024 scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism 0.0.1
scPreGAN Bioinformatics 2022 scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation 0.0.1
SCREEN Frontiers of Computer Science 2024 SCREEN: predicting single-cell gene expression perturbation responses via optimal transport 0.0.1
scVIDR Patterns 2023 Generative modeling of single-cell gene expression for dose-dependent chemical perturbations 0.0.3
trVAE Bioinformatics 2020 Conditional out-of-distribution generation for unpaired data using transfer VAE 1.1.2
AttentionPert Bioinformatics 2021 AttentionPert: Accurately Modeling Multiplexed Genetic Perturbations with Multi-scale Effects 0.0.1
CPA Molecular Systems Biology 2023 Predicting cellular responses to complex perturbations in high-throughput screens 0.8.5
GEARS Nature Biotechnology 2022 Predicting transcriptional outcomes of novel multigene perturbations with GEARS 0.1.0
GenePert bioRxiv 2024 GenePert: Leveraging GenePT Embeddings for Gene Perturbation Prediction 0.0.1
linearModel bioRxiv 2024 Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods 0.0.1
scGPT Nature Methods 2024 scGPT: toward building a foundation model for single-cell multi-omics using generative AI 0.2.1
scFoundation Nature Methods 2024 Large-scale foundation model on single-cell transcriptomics 0.0.1
chemCPA arXiv 2022 Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution 2.0.0
scouter bioRxiv 2024 Scouter: Predicting Transcriptional Responses to Genetic Perturbations with LLM embeddings 0.0.1

Citation

Zhiting Wei, Yiheng Wang, Yicheng Gao, Qi Liu et al. Recommendations and solutions for generalizable single-cell perturbation response prediction obtained from a systematic benchmark, submitted, 2025.

Contacts

bm2-lab@tongji.edu.cn, 1810546@tongji.edu.cn

About

Single-cell perturbation effects prediction benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published