From 787fd6023ba4d5426b620dcfc7548804dba95411 Mon Sep 17 00:00:00 2001 From: boomb0om Date: Sat, 9 Sep 2023 02:51:06 +0300 Subject: [PATCH 1/3] Update README.md --- README.md | 63 ++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 37b0687..aad1b8d 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics. Goals of this benchmark: -- **Unified** metrics and datasets for all models +- **Unified** metrics and datasets for all text-to-image models - **Reproducible** results - **User-friendly** interface for most popular metrics: FID and CLIP-score @@ -17,6 +17,7 @@ Goals of this benchmark: - [Examples](#examples) - [Documentation](#documentation) - [Contribution](#contribution) +- [TO-DO](#to-do) - [Contacts](#contacts) - [Citing](#citing) - [Acknowledgments](#acknowledgments) @@ -25,8 +26,8 @@ Goals of this benchmark: Generative text-to-image models have become a popular and widely used tool for users. There are many articles on the topic of image generation from text that present new, more advanced models. -However, there is still no uniform way to measure the quality of such models. -To address this issue, we provide an implementation of metrics to compare the quality of generative models. +**However, there is still no uniform way to measure the quality of such models**. +To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models. We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. We provide the MS-COCO validation subset and precalculated metrics for it. @@ -38,6 +39,7 @@ You can easily contribute your model into benchmark and make FID results reprodu - Standardized FID calculation: fixed image preprocessing and InceptionV3 model. - FID-30k on MS-COCO validation set: we provide dataset on [huggingface🤗](https://huggingface.co/datasets/stasstaf/MS-COCO-validation), [precomputed FID stats](https://github.com/boomb0om/text2image-benchmark/releases/download/v0.0.1/MS-COCO_val2014_fid_stats.npz), fixed [30000 captions from MS-COCO](https://github.com/boomb0om/text2image-benchmark/releases/download/v0.0.1/MS-COCO_val2014_30k_captions.csv) that should be used to generate images +- Implementations of different popular text-to-image models to make metrics **reproducible** - CLIP-score calculation - User-friendly metrics calculation (checkout [Getting started](#getting-started)) @@ -49,7 +51,6 @@ pip install git+https://github.com/boomb0om/text2image-benchmark ## Getting started - ### Metrics: FID Calculate FID for two sets of images: @@ -80,15 +81,28 @@ pip install -r T2IBenchmark/models/kandinsky21/requirements.txt ``` ```python -from T2IBenchmark import calculate_fid -from T2IBenchmark.datasets import get_coco_fid_stats +from T2IBenchmark import calculate_coco_fid +from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper -fid, _ = calculate_fid( - 'path/to/your/generations/', - get_coco_fid_stats() +fid, fid_data = calculate_coco_fid( + Kandinsky21Wrapper, + device='cuda:0', + save_generations_dir='coco_generations/' ) ``` +### Metrics: CLIP-score + +Example of calculating CLIP-score for a set of images and fixed prompt: + +```python +from T2IBenchmark import calculate_clip_score +from glob import glob + +cat_paths = glob('../assets/images/cats/*.jpg') +captions_mapping = {path: "a cat" for path in cat_paths} +clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping) +``` ## Project Structure @@ -98,25 +112,50 @@ fid, _ = calculate_fid( - `feature_extractors/` - Implementation of different neural nets used to extract features from images - `metrics/` - Implementation of metrics - `utils/` - Some utils +- `tests/` - Tests - `docs/` - Documentation -- `examples/` - Usage examples -- `experiments/` - Experiments +- `examples/` - Benchmark usage examples +- `experiments/` - Experiments with metrics - `assets/` - Assets ## Examples +Examples of use are listed below in recommended order for study: +- [Basic FID usage](examples/FID_basic.ipynb) +- [Advanced FID usage](examples/FID_advanced.ipynb) +- [CLIP score](examples/CLIP_score_usage.ipynb) +- [FID calculation on MS-COCO](examples/FID-30k_on_MS-COCO.ipynb) +- [Using ModelWrapper to measure MS-COCO FID-30k](examples/ModelWrapper_FID-30k.ipynb) ## Documentation - +- [FID.md](docs/FID.md) - Explanation of different parameters that affects FID calculation ## Contribution +If you want to contribute your model into this benchmark and publish metrics, follow these steps: +1) Create a fork of this repository +2) Create a wrapper for your model that inherits `T2IModelWrapper` class +3) Generate images and calculate metrics using `calculate_coco_fid`. For more information see [this example](examples/ModelWrapper_FID-30k.ipynb) +4) Create a pull request with your model +5) Congrats! + +## TO-DO + +- [ ] Implementation of Inception Score (IS) and Kernel Inception Distance (KID) +- [ ] FID-CLIPscore metric and plots +- [ ] Implementation and FIDs for [Kandinsky 2.X](https://github.com/ai-forever/Kandinsky-2) models with the help of Sber AI +- [ ] Implementation and FIDs for popular models from [diffusers](https://github.com/huggingface/diffusers): Stable Diffusion, IF ## Contacts +Authors: +- Pavlov Igor, [github](https://github.com/boomb0om) +- Artyom Ivanov, [github](https://github.com/UsefulTornado) +- Stanislav Stafievskiy, [github](https://github.com/stasstaf) + If you have any question, please email `jeartgle@gmail.com`. ## Citing From 9073741b6ac17c61441e95ae878cc06788161642 Mon Sep 17 00:00:00 2001 From: boomb0om Date: Sat, 9 Sep 2023 02:56:19 +0300 Subject: [PATCH 2/3] Fix installation reqs --- requirements.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index f5655f9..d3ab7fc 100644 --- a/requirements.txt +++ b/requirements.txt @@ -8,5 +8,4 @@ pillow datasets opencv-python ftfy -regex -git+https://github.com/openai/CLIP.git \ No newline at end of file +regex \ No newline at end of file From dfa53e06edfa4b3aa863897e7011823224450108 Mon Sep 17 00:00:00 2001 From: boomb0om Date: Sat, 9 Sep 2023 02:58:07 +0300 Subject: [PATCH 3/3] Update installation guide --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index aad1b8d..7565bed 100644 --- a/README.md +++ b/README.md @@ -46,6 +46,7 @@ You can easily contribute your model into benchmark and make FID results reprodu ## Installation ```bash +pip install git+https://github.com/openai/CLIP.git pip install git+https://github.com/boomb0om/text2image-benchmark ```