T2I-Eval: Open-source toolkit for automatic evaluation of text-to-image generation task

Implementation of paper Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark.

🔥 News

2025 Jan 22: Fix issue in training dataset construction logic.
2024 Dec 10: Support offline inference with vLLM.
2024 Nov 28: Code for evaluation and fine-tuning is released. Code for training set construction is coming soon.
2024 Nov 25: Paper available on Arxiv. Code coming Soon.

Resources

Item	Repository
Distilled Model	🤗DataHammer/T2I_Eval-MiniCPM-V-2_6
Benchmark Dataset	🤗DataHammer/T2I-Eval-Bench
Training Dataset	🤗DataHammer/T2I-Eval-Distill-GPT-4o

Performance

Methods	Manual-1 ($\rho$)	Manual-1 ($\tau$)	Manual-2 ($\rho$)	Manual-2 ($\tau$)	Manual-3 ($\rho$)	Manual-3 ($\tau$)	Manual-Avg. ($\rho$)	Manual-Avg. ($\tau$)
Average Score of Human Annotators
$\text{Manual-Avg.}$	0.9511	0.8807	0.9452	0.8686	0.9513	0.8793	-	-
Traditional Methods
$\text{FID}$	-0.1183	-0.0871	-0.1000	-0.0724	-0.0897	-0.0685	-0.1231	-0.0862
$\text{LPIPS}$	-0.1206	-0.0898	-0.0882	-0.0644	-0.1025	-0.0732	-0.1244	-0.0856
$\text{DreamSim}$	-0.1284	-0.0953	-0.1230	-0.0897	-0.1308	-0.0973	-0.1382	-0.0968
$\text{CLIPScore}$	0.1532	0.1078	0.1725	0.1210	0.1227	0.0855	0.1505	0.1016
$\text{BLIPv2Score}$	0.2278	0.1588	0.2280	0.1617	0.2134	0.1477	0.2152	0.1423
$\text{ImageReward}$	0.4171	0.3065	0.3712	0.2690	0.4134	0.3030	0.4046	0.2839
LLM/MLLM-Based Methods
$\text{LLMScore}_\text{GPT-4}$	0.3009	0.2212	0.2697	0.2012	0.3299	0.2497	0.3096	0.2228
$\text{DSG}_\text{Dependent}$	0.4742	0.3790	0.4204	0.3339	0.4562	0.3652	0.4582	0.3512
$\text{DSG}_\text{Independent}$	0.4815	0.3891	0.4382	0.3502	0.4721	0.3827	0.4704	0.3655
$\text{VQAScore}_\text{CLIP-FlanT5}$	0.4984	0.3768	0.4864	0.3619	0.5118	0.3854	0.5116	0.3712
$\text{VIEScore}_\text{MiniCPM-V-2.6}$	0.2834	0.2251	0.2814	0.2231	0.3016	0.2422	0.2941	0.2250
$\text{VIEScore}_{\text{MiniCPM-V-2.6}^{*}}$	0.4906	0.3878	0.4869	0.3836	0.4889	0.3899	0.5101	0.3897
$\text{VIEScore}_\text{GPT-4o}$	0.5522	0.4283	0.5306	0.4101	0.5170	0.4024	0.5545	0.4170
Closed-Source MLLM with Our Pipeline
$\text{Ours}_\text{GPT-4o}$	0.5437	0.4302	0.5355	0.4214	0.5138	0.4061	0.5566	0.4285
Our Fine-tuned Open-source MLLM
$\text{Ours}_\text{MiniCPM-V-2.6}$	0.5334	0.4192	0.5946	0.4644	0.5537	0.4348	0.5802	0.4409

🚀 Getting Started

🛠️ Installation

conda create -n t2i_eval python=3.10 -y
conda activate t2i_eval
git clone https://github.com/maziao/T2I-Eval.git
cd T2I-Eval
pip install -r requirements.txt

Install vLLM for faster inference with this guide:

pip install vllm

[Optional] If you want to fine-tune your own evaluation MLLM, please setup SWIFT following this instruction:

pip install 'ms-swift[all]==2.6.1' -U

Model and Data Preparation

[Optional] Enable hf_transfer for faster downloads

pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

Download MiniCPM-V-2.6 checkpoint (NOTE: You need to agree to share your contact information to access this model)

# if MiniCPM-V-2.6 checkpoint has not been downloaded
mkdir -p models/minicpm-v-2_6/original
huggingface-cli download openbmb/MiniCPM-V-2_6 \
  --local-dir models/minicpm-v-2_6/original \
  --local-dir-use-symlink False

# if MiniCPM-V-2.6 checkpoint has already been downloaded
mkdir -p models/minicpm-v-2_6
ln -s PATH_TO_CHECKPOINT models/minicpm-v-2_6/original

Download LoRA adapter for MiniCPM-V-2.6

mkdir -p models/minicpm-v-2_6/LoRA
huggingface-cli download DataHammer/T2I_Eval-MiniCPM-V-2_6 \
  --local-dir models/minicpm-v-2_6/LoRA \
  --local-dir-use-symlink False

# merge LoRA adapter with the original checkpoint
bash scripts/merge_adapter.sh GPU_ID # replace GPU_ID

Download benchmark dataset

mkdir -p data/test
huggingface-cli download DataHammer/T2I-Eval-Bench \
  --repo-type dataset \
  --local-dir data/test \
  --local-dir-use-symlink False

tar -zxvf data/test/images.tar.gz --directory=data/test

[Optional] Download training dataset

# download the training dataset for fine-tuning (optional)
mkdir -p data/train
huggingface-cli download DataHammer/T2I-Eval-Distill-GPT-4o \
  --repo-type dataset \
  --local-dir data/train \
  --local-dir-use-symlink False

tar -zxvf data/train/t2i-eval-distill-gpt-4o.json.tar.gz --directory=data/train
tar -zxvf data/train/images/COCO.tar.gz --directory=data/train/images
tar -zxvf data/train/images/LLM.tar.gz --directory=data/train/images
tar -zxvf data/train/images/reference.tar.gz --directory=data/train/images

Evaluation

Deploy with OpenAI-Compatible Server (Recommended)

Deployment

bash scripts/deploy.sh GPU_ID HOST_ADDR PORT # 0 localhost 65535

Inference

bash scripts/inference.sh HOST_ADDR PORT # localhost 65535

Offline Inference

bash scripts/inference_offline.sh GPU_ID # replace GPU_ID

Fine-tuning

[Optional] Customize Sample Format for Your Model

For example, when using MiniCPM-V-2.6, you need to create a new class that inherits from the base dataset constructor class, $\verb|T2VEvalSFTDataConstructor|$. In this class, you are required to implement two key methods:

$\verb|replace_image_placeholder|$: replace the original image placeholders with the one of the model you are about to fine-tune.

$\verb|conv_template|$: construct a single training sample (a conversation) given the corpus. The given corpus follows this format:

{
  "gt_image": str,
  "query": str,
  "response": str,
  "history": [
    (query_1, response_1),
    ...
  ],
  "ref_image": str | None,
  "id": str | None
}

Construct Training Dataset

Convert training data into the target format

mkdir -p data/train/minicpm-v-2_6/raw

python build_dataset.py \
  --data-files data/train/t2i-eval-distill-gpt-4o.json \
  --image-dir $(pwd)/data/train \
  --output-dir data/train/minicpm-v-2_6/raw

Make an index file data/train/minicpm-v-2_6/index.txt for sub-tasks:

extract.json 8
appearance-multi-stage_1.json 1
appearance-multi-stage_2.json 1
intrinsic-multi-stage_1.json 2
intrinsic-multi-stage_2.json 2
relationship-multi-stage_1.json 3
relationship-multi-stage_2.json 3
summarize-multi-stage_1.json 2
summarize-multi-stage_2.json 2

Merge training samples of all sub-tasks

python merge_dataset.py \
  --data-dir data/train/minicpm-v-2_6/raw/t2i-eval-distill-gpt-4o \
  --index-file data/train/minicpm-v-2_6/index.txt \
  --output-file data/train/minicpm-v-2_6/train.json \
  --rebalance

Start Training (with SWIFT)

# data parallel on 4 Nvidia A100-SXM4-80GB
bash scripts/finetune_lora.sh 0,1,2,3 4

📎 Citation

If you find this repository useful for your research, please cite our paper:

@misc{tu2024automaticevaluationtexttoimagegeneration,
    title={Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark},
    author={Rong-Cheng Tu and Zi-Ao Ma and Tian Lan and Yuehao Zhao and Heyan Huang and Xian-Ling Mao},
    year={2024},
    eprint={2411.15488},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2411.15488},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_dataset.py		build_dataset.py
merge_dataset.py		merge_dataset.py
requirements.txt		requirements.txt
t2i_eval.py		t2i_eval.py
t2i_eval_offline.py		t2i_eval_offline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T2I-Eval: Open-source toolkit for automatic evaluation of text-to-image generation task

🔥 News

Resources

Performance

🚀 Getting Started

🛠️ Installation

Model and Data Preparation

Evaluation

Deploy with OpenAI-Compatible Server (Recommended)

Offline Inference

Fine-tuning

[Optional] Customize Sample Format for Your Model

Construct Training Dataset

Start Training (with SWIFT)

📎 Citation

About

Releases

Packages

Languages

License

maziao/T2I-Eval

Folders and files

Latest commit

History

Repository files navigation

T2I-Eval: Open-source toolkit for automatic evaluation of text-to-image generation task

🔥 News

Resources

Performance

🚀 Getting Started

🛠️ Installation

Model and Data Preparation

Evaluation

Deploy with OpenAI-Compatible Server (Recommended)

Offline Inference

Fine-tuning

[Optional] Customize Sample Format for Your Model

Construct Training Dataset

Start Training (with SWIFT)

📎 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages