Updating...
Molecular representation (MR) refers to the process of converting molecules into mathematical or computational formats that can be processed by algorithms to model, analyze and predict molecular behavior. Effective molecular representation is critical for many tasks in drug discovery, such as virtual screening, activity prediction, and scaffold hopping, which aim to navigate the chemical space effectively and efficiently.
Considering the increasing number of papers in this field, we roughly summarize some articles and put them into the following categories:
- Molecular Fingerprints & Descriptors-based MR
- Language Model-based MR
- Graph-based MR
- Multimodal-based MR
- Contrastive Learning-based MR
- [2024] Molecular representations in bio-cheminformatics (Memetic Computing) [paper]
- [2024] From intuition to AI: evolution of small molecule representations in drug discovery (Briefings in Bioinformatics) [paper]
- [2024] Image-based molecular representation learning for drug development: a survey (Briefings in Bioinformatics) [paper] [δΈζ解读]
- [2022] Deep learning methods for molecular representation and property prediction (Drug Discovery Today) [paper]
- [2021] A review of molecular representation in the age of machine learning (WIREs Computational Molecular Science) [paper]
- [2021] Geometric deep learning on molecular representations (Nature Machine Intelligence) [paper]
- [2020] Molecular representations in AI-driven drug discovery: a review and practical guide (Journal of Cheminformatics) [paper]
To be continued...
To be continued...
-
[CrossFuse-XGBoost] CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting (2024) (Briefings in Bioinformatics) [paper] [code]
-
[MapLight] ADMET property prediction through combinations of molecular fingerprints (2023) (arXiv) [paper] [code]
-
[BoostSweet] BoostSweet: Learning molecular perceptual representations of sweeteners (2022) (Food Chemistry) [paper]
-
[FP-BERT] A fingerprints based molecular property prediction method using the BERT model (2022) (Journal of Cheminformatics) [paper] [code]
-
[MolMapNet] Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations (2021) (Nature Machine Intelligence) [paper] [code]
-
[FP-ADMET] FP-ADMET a compendium of fingerprint-based ADMET prediction models (2021) (Journal of Cheminformatics) [paper] [code]
-
[t-SMILES] t-SMILES: a fragment-based molecular representation framework for de novo ligand design (2024) (Nature Communications) [paper] [code]
-
[INTransformer] INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction (2024) (Journal of Molecular Graphics and Modelling) [paper] [code]
-
[DeepSA] DeepSA: a deep-learning driven predictor of compound synthesis accessibility (2023) (Journal of Cheminformatics) [paper] [code]
-
[MolRoPE-BERT] MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction (2023) (Journal of Molecular Graphics and Modelling) [paper]
-
[MOLFORMER] Molecular set representation learning (2022) (Nature Machine Intelligence) [paper] [code]
-
[MTL-BERT] Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration (2022) (Research) [paper] [code]
-
[Mol-BERT] Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction (2021) (Wireless Communications and Mobile Computing) [paper] [code]
-
[Mol2vec] Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition (2018) (Journal of Chemical Information and Modeling) [paper] [code]
-
[MMGX] Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX (2024) (Communications Chemistry) [paper] [code]
-
[R-MAT] Relative Molecule Self-Attention Transformer (2024) (Journal of Cheminformatics) [paper] [code]
-
[SMPT] Pre-training molecular representation model with spatial geometry for property prediction (2024) (Computational Biology and Chemistry) [paper] [code]
-
[TOML-BERT] Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning (2024) (Journal of Medicinal Chemistry) [paper] [code]
-
[Gram matrix] Gram matrix: an efficient representation of molecular conformation and learning objective for molecular pretraining (2024) (Briefings in Bioinformatics) [paper] [code]
-
[GSL-MPP] Molecular property prediction based on graph structure learning (2024) (Bioinformatics) [paper] [code]
-
[MolFormer] Large-scale chemical language representations capture molecular structure and properties (2024) (Nature Machine Intelligence) [paper] [code]
-
[FunQG] FunQG: Molecular Representation Learning via Quotient Graphs (2023) (Journal of Chemical Information and Modeling) [paper] [code]
-
[MolCAP] MolCAP: Molecular Chemical reActivity Pretraining and prompted-finetuning enhanced molecular representation learning (2023) (Computers in Biology and Medicine) [paper] [code]
-
[SME] Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking (2023) (Nature Communications) [paper] [code]
-
[HiMol] Hierarchical Molecular Graph Self-Supervised Learning for Property Prediction (2023) (Communications Chemistry) [paper] [code]
-
[PharmHGT] Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction (2023) (Journal of Chemical Information and Modeling) [paper] [code]
-
[IFGN] Predicting molecular properties based on the interpretable graph neural network with multistep focus mechanism (2023) (Briefings in Bioinformatics) [paper]
-
[KANO] Knowledge graph-enhanced molecular contrastive learning with functional prompt. (2023) (Nature Machine Intelligence) [paper] [code]
-
[KPGT] A knowledge-guided pre-training framework for improving molecular representation learning (2023) (Nature Communications) [paper] [code]
-
[ReLMole] ReLMole: Molecular Representation Learning Based on Two-Level Graph Similarities (2022) (Journal of Chemical Information and Modeling) [paper] [code]
-
[GEM] Geometry-enhanced molecular representation learning for property prediction (2022) (Nature Machine Intelligence) [paper] [code]
-
[GraphMVP] Pre-Training Molecular Graph Representation with 3D Geometry (2022) (ICLR 2022) [paper] [code]
-
[MPG] An effective self-supervised framework for learning expressive molecular global representations to drug discovery (2021) (Briefings in Bioinformatics) [paper] [code]
-
[GROVER] Self-Supervised Graph Transformer on Large-Scale Molecular Data (2020) (NIPS2020) [paper] [code]
-
[Attentive FP] Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism (2020) (Journal of Medicinal Chemistry) [paper] [code]
-
[MoleSG] Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction (2024) (Briefings in Bioinformatics) [paper] [code]
-
[MMFDL] Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph (2024) (Computational and Structural Biotechnology Journal) [paper] [code]
-
[COATI] COATI:Multimodal Contrastive Pretraining for Representing and Traversing ChemicalSpace (2024) (Journal of Chemical Information and Modeling) [paper] [code]
-
[DLF-MFF] A deep learning framework for predicting molecular property based on multi-type features fusion (2024) (Computers in Biology and Medicine) [paper] [code]
-
[VideoMol] A Molecular Video-derived Foundation Model for Scientific Drug Discovery (2024) (Nature Communications) [paper] [code]
-
[MvMRL] MvMRL: a multi-view molecular representation learning method for molecular property prediction (2024) (Briefings in Bioinformatics) [paper] [code]
-
[PremuNet] A pre-trained multi-representation fusion network for molecular property prediction (2024) (Information Fusion) [paper] [code]
-
[ISMol] Dual-View Learning Based on Images and Sequences for Molecular Property Prediction. (2024) (IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS,) [paper] [code]
-
[CLAMP] Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language (2023) (arXiv) [paper] [code]
-
[CGIP] Chemical structure-aware molecular image representation learning (2023) (Briefings in Bioinformatics) [paper] [code]
-
[UniMAP] UniMAP: Universal SMILES-Graph Representation Learning (2023) (arXiv) [paper]
-
[FP-GNN] FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction (2022) (Briefings in Bioinformatics) [paper] [code]
-
[ImageMol] Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework (2022) (Nature Machine Intelligence) [paper] [code]
-
[PhenoScreen] PhenoScreen: A Dual-Space Contrastive Learning Framework-based Phenotypic Screening Method by Linking Chemical Perturbations to Cellular Morphology (2024) (BioRxiv) [paper] [code]
-
[MOCO] Molecular Contrastive Pretraining with Collaborative Featurizations (2024) (Journal of Chemical Information and Modeling) [paper]
-
[MolFeSCue] MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning (2024) (Bioinformatics) [paper] [code]
-
[3D-MOL] 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information (2024) (Pattern Analysis and Applications) [paper] [code]
-
[UniCorn] UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning (2024) (arXiv) [paper]
-
[3DGCL] 3D graph contrastive learning for molecular property prediction (2023) (Bioinformatics) [paper] [code]
-
[CasANGCL] CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction (2023) (Briefings in Bioinformatics) [paper] [code]
-
[FraSICL] Molecular property prediction by semantic-invariant contrastive learning (2023) (Bioinformatics) [paper] [code]
-
[iMolCLR] Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast (2022) (Journal of Chemical Information and Modeling) [paper] [code]
-
[MolCLR] Molecular contrastive learning of representations via graph neural networks (2022) (Nature Machine Intelligence) [paper] [code]
-
[ATMOL] Attention-wise masked graph contrastive learning for predicting molecular property (2022) (Briefings in Bioinformatics) [paper] [code]
-
[SMICLR] Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning (2022) (Journal of Chemical Information and Modeling) [paper] [code]
-
[MoCL] MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph (2021) (KDD2021) [paper] [code]
-
[GraphCL] Graph Contrastive Learning with Augmentations (2020) (arXiv) [paper] [code]
We would like to thank all the developers who have contributed to the field of molecular representation. Shihang Wang thanks Lin Wang , Jianmin Wang and Bo Li for their inspiration and help.
If you have any questions, please feel free to contact Shihang Wang (Email: wangshh12022@shanghaitech.edu.cn).
Pull requests are highly welcomed!