by Surbhi Mittal, Arnav Sudan, Mayank Vatsa, Richa Singh, Tamar Glaser and Tal Hassner
This research explores the bias in text-to-image (TTI) models for the Indic languages widely spoken throughout India. It examines and compares the generative performance and cultural aspects of leading TTI models in these languages, contrasting it with their English language capabilities. Employing the proposed IndicTTI benchmark, this research comprehensively evaluates the performance of 30 Indic languages using two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to measure how well these models support Indic languages and identify areas in need of improvement. Considering the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness in the context of Indic linguistic landscapes.
All Python source code (including .py
and .ipynb
files) is made available
under the MIT license. You can freely use and modify the code, without
warranty, so long as you provide attribution to the authors. See
LICENSE-MIT.txt
for the full license text.
The manuscript text (including all LaTeX files), figures, and data/models
produced as part of this research are available under the Creative Commons
Attribution 4.0 License (CC-BY). See LICENSE-CC-BY.txt
for the full
license text.
@article{mittal2024indicTTI,
title={IndicTTI: Navigating Text-to-Image Generative Bias across Indic Languages},
author={Mittal, Surbhi and Sudan, Arnav and Vatsa, Mayank and Singh, Richa and Glaser, Tamar and Hassner, Tal},
journal={ECCV},
year={2024}}