Skip to content

Commit

Permalink
Merge branch 'main' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
pauldoucet committed Nov 5, 2024
2 parents 2705e74 + 239b506 commit a5aea93
Show file tree
Hide file tree
Showing 7 changed files with 277 additions and 73 deletions.
63 changes: 34 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
# HEST-Library: Bringing Spatial Transcriptomics and Histopathology together
## Designed for querying and assembling HEST-1k dataset

\[ [arXiv](https://arxiv.org/abs/2406.16192) | [HEST-1k](https://huggingface.co/datasets/MahmoodLab/hest) \]
\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Data](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) | [Cite](https://github.com/mahmoodlab/hest?tab=readme-ov-file#citation) \]
<!-- [ArXiv (stay tuned)]() | [Interactive Demo](http://clam.mahmoodlab.org) | [Cite](#reference) -->

<img src="figures/fig1a.jpg" width="450px" align="right" />
Welcome to the official GitHub repository of the HEST-Library introduced in *"HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024*. This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard Medical School and Brigham and Women's Hospital.

Welcome to the official GitHub repository of the HEST-Library introduced in *"HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis"*. This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard Medical School and Brigham and Women's Hospital.

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.
<img src="figures/fig1.jpeg" />

<br/>

### What does this repository provide?
- **HEST-1k:** Free access to <b>HEST-1K</b>, a dataset of 1,108 paired Spatial Transcriptomics samples with HE-stained whole-slide images
- **HEST-1k:** Free access to <b>HEST-1K</b>, a dataset of 1,229 paired Spatial Transcriptomics samples with HE-stained whole-slide images
- **HEST-Library:** A series of helpers to assemble new ST samples (ST, Visium, Visium HD, Xenium) and work with HEST-1k (ST analysis, batch effect viz and correction, etc.)
- **HEST-Benchmark:** A new benchmark to assess the predictive performance of foundation models for histology in predicting gene expression from morphology

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

<br/>

## Updates

- **21.10.24**: HEST has been accepted to NeurIPS 2024 as a Spotlight! We will be in Vancouver from Dec 10th to 15th. Send us a message if you wanna learn more about HEST (gjaume@bwh.harvard.edu).

- **23.09.24**: 121 new samples released, including 27 Xenium and 7 Visium HD! We also make the aligned Xenium transcripts + the aligned DAPI segmented cells/nuclei public.

- **30.08.24**: HEST-Benchmark results updated. Includes H-Optimus-0, Virchow 2, Virchow, and GigaPath. New COAD task based on 4 Xenium samples. HuggingFace bench data have been updated.
Expand Down Expand Up @@ -83,27 +85,28 @@ In addition, we provide complete [documentation](https://hest.readthedocs.io/en/

## HEST-Benchmark

The HEST-Benchmark was designed to assess foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes 10 tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in 10 different organs and 9 cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).

### HEST-Benchmark results (08.30.24)

HEST-Benchmark was used to assess 10 publicly available models.
HEST-Benchmark was used to assess 11 publicly available models.
Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction.
Model performance measured with Pearson correlation. Best is **bold**, second best
is _underlined_. Additional results based on Random Forest and XGBoost regression are provided in the paper.

| **Dataset** | **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | **[Virchow](https://huggingface.co/paige-ai/Virchow)** | **[UNI](https://huggingface.co/MahmoodLab/UNI)** | **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | **[Phikon](https://huggingface.co/owkin/phikon)** | **[Remedis](https://arxiv.org/abs/2205.09723)** | **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | **[Resnet50](https://arxiv.org/abs/1512.03385)** | **[Plip](https://www.nature.com/articles/s41591-023-02504-3)** |
|:--------------|----------------:|---------------:|--------------:|-------------:|---------------:|---------------:|-------------:|--------------:|-----------------:|---------------:|-----------:|
| **IDC** | **0.5988** | 0.5903 | 0.5725 | 0.5718 | 0.5505 | 0.5363 | 0.5327 | 0.5304 | 0.511 | 0.4732 | 0.4717 |
| **PRAD** | 0.3768 | 0.3478 | 0.3341 | 0.3095 | **0.3776** | 0.3548 | 0.342 | 0.3531 | 0.3427 | 0.306 | 0.2819 |
| **PAAD** | **0.4936** | 0.4716 | 0.4926 | 0.478 | 0.476 | 0.4475 | 0.4441 | 0.4647 | 0.4378 | 0.386 | 0.4099 |
| **SKCM** | **0.6521** | 0.613 | 0.6056 | 0.6344 | 0.5607 | 0.5784 | 0.5334 | 0.5816 | 0.5103 | 0.4825 | 0.5117 |
| **COAD** | 0.3054 | 0.252 | **0.3115** | 0.2876 | 0.2595 | 0.2579 | 0.2573 | 0.2528 | 0.249 | 0.231 | 0.0518 |
| **READ** | **0.2209** | 0.2109 | 0.1999 | 0.1822 | 0.1888 | 0.1617 | 0.1631 | 0.1216 | 0.1131 | 0.0842 | 0.0927 |
| **CCRCC** | 0.2717 | **0.275** | 0.2638 | 0.2402 | 0.2436 | 0.2179 | 0.2423 | 0.2643 | 0.2279 | 0.218 | 0.1902 |
| **LUNG** | **0.5605** | 0.5554 | 0.5433 | 0.5499 | 0.5412 | 0.5317 | 0.5522 | 0.538 | 0.5049 | 0.4919 | 0.4838 |
| **LYMPH_IDC** | 0.2578 | **0.2598** | 0.2582 | 0.2537 | 0.2491 | 0.2507 | 0.2373 | 0.2465 | 0.2354 | 0.2284 | 0.2382 |
| **AVG** | **0.4153** | 0.3973 | 0.3979 | 0.3897 | 0.383 | 0.3708 | 0.3672 | 0.3726 | 0.348 | 0.3224 | 0.3035 |
| Model | IDC | PRAD | PAAD | SKCM | COAD | READ | ccRCC | LUAD | LYMPH IDC | Average |
|------------------------|--------|--------|--------|--------|--------|--------|--------|--------|-----------|---------|
| **[Resnet50](https://arxiv.org/abs/1512.03385)** | 0.4741 | 0.3075 | 0.3889 | 0.4822 | 0.2528 | 0.0812 | 0.2231 | 0.4917 | 0.2322 | 0.326 |
| **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | 0.511 | 0.3427 | 0.4378 | 0.5106 | 0.2285 | 0.11 | 0.2279 | 0.4985 | 0.2353 | 0.3447 |
| **[Phikon](https://huggingface.co/owkin/phikon)** | 0.5327 | 0.342 | 0.4432 | 0.5355 | 0.2585 | 0.1517 | 0.2423 | 0.5468 | 0.2373 | 0.3656 |
| **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | 0.5363 | 0.3548 | 0.4475 | 0.5791 | 0.2533 | 0.1674 | 0.2179 | 0.5312 | 0.2507 | 0.3709 |
| **[Remedis](https://arxiv.org/abs/2205.09723)** | 0.529 | 0.3471 | 0.4644 | 0.5818 | 0.2856 | 0.1145 | 0.2647 | 0.5336 | 0.2473 | 0.3742 |
| **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | 0.5508 | _0.3708_ | 0.4768 | 0.5538 | _0.301_ | 0.186 | 0.2391 | 0.5399 | 0.2493 | 0.3853 |
| **[UNI](https://huggingface.co/MahmoodLab/UNI)** | 0.5702 | 0.314 | 0.4764 | 0.6254 | 0.263 | 0.1762 | 0.2427 | 0.5511 | 0.2565 | 0.3862 |
| **[Virchow](https://huggingface.co/paige-ai/Virchow)** | 0.5702 | 0.3309 | 0.4875 | 0.6088 | **0.311** | 0.2019 | 0.2637 | 0.5459 | 0.2594 | 0.3977 |
| **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | 0.5922 | 0.3465 | 0.4661 | 0.6174 | 0.2578 | 0.2084 | **0.2788** | **0.5605** | 0.2582 | 0.3984 |
| **UNIv1.5** | **0.5989** | 0.3645 | _0.4902_ | _0.6401_ | 0.2925 | _0.2240_ | 0.2522 | _0.5586_ | **0.2597** | _0.4090_ |
| **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | _0.5982_ | **0.385** | **0.4932** | **0.6432** | 0.2991 | **0.2292** | _0.2654_ | 0.5582 | _0.2595_ | **0.4146** |


### Benchmarking your own model
Expand All @@ -114,22 +117,24 @@ Our tutorial in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/H

## Issues
- The preferred mode of communication is via GitHub issues.
- If GitHub issues are inappropriate, email `gjaume@bwh.harvard.edu` (and cc `pdoucet@bwh.harvard.edu`).
- If GitHub issues are inappropriate, email `gjaume@bwh.harvard.edu` (and cc `homedoucetpaul@gmail.com`).
- Immediate response to minor issues may not be available.

## Citation

If you find our work useful in your research, please consider citing:

Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. _Advances in Neural Information Processing Systems_, December 2024.

```
@article{jaume2024hest,
author = {Jaume, Guillaume and Doucet, Paul and Song, Andrew H. and Lu, Ming Y. and Almagro-Perez, Cristina and Wagner, Sophia J. and Vaidya, Anurag J. and Chen, Richard J. and Williamson, Drew F. K. and Kim, Ahrong and Mahmood, Faisal},
title = {{HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis}},
journal = {arXiv},
year = {2024},
month = jun,
eprint = {2406.16192},
url = {https://arxiv.org/abs/2406.16192v1}
@inproceedings{jaume2024hest,
author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},
title = {HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis},
booktitle = {Advances in Neural Information Processing Systems},
year = {2024},
month = dec,
}
```

<img src=docs/joint_logo.png>
Binary file added figures/fig1.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed figures/fig1.png
Binary file not shown.
Binary file removed figures/fig1a.jpg
Binary file not shown.
Loading

0 comments on commit a5aea93

Please sign in to comment.