This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.

Goals of this benchmark:

Unified metrics and datasets for all text-to-image models
Reproducible results
User-friendly interface for most popular metrics: FID and CLIP-score

Introduction

Generative text-to-image models have become a popular and widely used tool for users. There are many articles on the topic of image generation from text that present new, more advanced models. However, there is still no uniform way to measure the quality of such models. To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models.

We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. We provide the MS-COCO validation subset and precalculated metrics for it. We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.

You can easily contribute your model into benchmark and make FID results reproducible! See more in contribution section.

Main features

Standardized FID calculation: fixed image preprocessing and InceptionV3 model.
FID-30k on MS-COCO validation set: we provide dataset on huggingface🤗, precomputed FID stats, fixed 30000 captions from MS-COCO that should be used to generate images
Implementations of different popular text-to-image models to make metrics reproducible
CLIP-score calculation
User-friendly metrics calculation (checkout Getting started)

Installation

pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/boomb0om/text2image-benchmark

Getting started

Metrics: FID

Calculate FID for two sets of images:

from T2IBenchmark import calculate_fid

fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
print(fid)

Calculate FID between model generations and MS-COCO validation subset:

from T2IBenchmark import calculate_fid
from T2IBenchmark.datasets import get_coco_fid_stats

fid, _ = calculate_fid(
    'path/to/your/generations/',
    get_coco_fid_stats()
)

MS-COCO FID-30k for T2IModelWrapper. In this example we are using Kandinsky 2.1 model:

pip install -r T2IBenchmark/models/kandinsky21/requirements.txt

from T2IBenchmark import calculate_coco_fid
from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper

fid, fid_data = calculate_coco_fid(
    Kandinsky21Wrapper,
    device='cuda:0',
    save_generations_dir='coco_generations/'
)

Metrics: CLIP-score

Example of calculating CLIP-score for a set of images and fixed prompt:

from T2IBenchmark import calculate_clip_score
from glob import glob

cat_paths = glob('assets/images/cats/*.jpg')
captions_mapping = {path: "a cat" for path in cat_paths}
clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping)

Project Structure

T2IBenchmark/
- datasets/ - Datasets that can be used for evaluation
  - coco2014/ - MS-COCO 2014 validation subset
- feature_extractors/ - Implementation of different neural nets used to extract features from images
- metrics/ - Implementation of metrics
- utils/ - Some utils
tests/ - Tests
docs/ - Documentation
examples/ - Benchmark usage examples
experiments/ - Experiments with metrics
assets/ - Assets

Examples

Examples of use are listed below in recommended order for study:

Basic FID usage
Advanced FID usage
CLIP score
FID calculation on MS-COCO
Using ModelWrapper to measure MS-COCO FID-30k

Documentation

FID.md - Explanation of different parameters that affects FID calculation

Contribution

If you want to contribute your model into this benchmark and publish metrics, follow these steps:

Create a fork of this repository
Create a wrapper for your model that inherits T2IModelWrapper class
Generate images and calculate metrics using calculate_coco_fid. For more information see this example
Create a pull request with your model
Congrats!

TO-DO

Implementation of Inception Score (IS) and Kernel Inception Distance (KID)
FID-CLIPscore metric and plots
Implementation and FIDs for Kandinsky 2.X models with the help of Sber AI
Implementation and FIDs for popular models from diffusers: Stable Diffusion, IF

Contacts

Authors:

Pavlov Igor, github
Artyom Ivanov, github
Stanislav Stafievskiy, github

If you have any question, please email jeartgle@gmail.com.

Citing

If you use this repository in your research, consider citing it using the following Bibtex entry:

@misc{boomb0omT2IBenchmark,
  author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
  title={{Text-to-Image Benchmark: A benchmark for generative models}},
  howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
  month={September},
  year={2023},
  note={Version 0.1.0},
}

Acknowledgments

Thanks to:

clean-fid - Explanation of influence of various parameters when calculating FID.
pytorch-fid - Port of the official implementation of Frechet Inception Distance to PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Table of Contents

Introduction

Main features

Installation

Getting started

Metrics: FID

Metrics: CLIP-score

Project Structure

Examples

Documentation

Contribution

TO-DO

Contacts

Citing

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Table of Contents

Introduction

Main features

Installation

Getting started

Metrics: FID

Metrics: CLIP-score

Project Structure

Examples

Documentation

Contribution

TO-DO

Contacts

Citing

Acknowledgments