Pytorch implementation for reproducing Text to Face (T2F) using AttnGAN results from our paper Development and Deployment of a Generative Model-Based Framework for Text to Photorealistic Image Generation.
In addition, please add the project folder to PYTHONPATH and pip install
the following packages:
python-dateutil
easydict
pandas
torchfile
nltk
scikit-image
- Download our preprocessed metadata for birds & CelebA and save them to
data/
- Download the birds image data. Extract them to
data/birds/
- For faces download CelebA dataset and extract the images to
data/face/
-
Pre-train DAMSM models:
- For bird dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
- For face dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/face.yml --gpu 1
- For bird dataset:
-
Train AttnGAN models:
- For birds dataset:
python main.py --cfg cfg/bird_attn2.yml --gpu 2
- For CelebA dataset:
python main.py --cfg cfg/face_attn2.yml --gpu 3
- For birds dataset:
-
*.yml
files are example configuration files for training/evaluation our models.
Pretrained Model
- DAMSM for bird. Download and save it to
DAMSMencoders/
- DAMSM for CelebA. Download and save it to
DAMSMencoders/
- AttnGAN for bird. Download and save it to
models/
- AttnGAN for CelebA. Download and save it to
models/
- AttnDCGAN for bird. Download and save it to
models/
- This is an variant of AttnGAN which applies the propsoed attention mechanisms to DCGAN framework.
Run (Sampling)
- Run
python main.py --cfg cfg/eval_bird.yml --gpu 1
to generate examples from captions in files listed in "./data/birds/example_filenames.txt". Results are saved toDAMSMencoders/
. - Change the
eval_*.yml
files to generate images from other pre-trained models. - Input your own sentence in "./data/birds/example_captions.txt" if you wannt to generate images from customized sentences.
- To generate images for all captions in the validation dataset, change B_VALIDATION to True in the eval_*.yml. and then run
python main.py --cfg cfg/eval_bird.yml --gpu 1
- We compute FID score for models trained on CelebA using #.
Examples generated by AttnGAN [Blog]
CelebA example |
---|
![]() |
Creating an API
Evaluation code embedded into a callable containerized API is included in the eval\
folder.
If you find Text to Face (T2F) using AttnGAN useful in your research, please consider citing:
@article{PANDE2021,
title = {Development and Deployment of a Generative Model-Based Framework for Text to Photorealistic Image Generation},
journal = {Neurocomputing},
year = {2021},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2021.08.055},
url = {https://www.sciencedirect.com/science/article/pii/S092523122101239X},
author = {Sharad Pande, Srishti Chouhan, Ritesh Sonavane, Rahee Walambe, George Ghinea, Ketan Kotecha},
keywords = {text-to-image, text-to-face, face synthesis, GAN, AttnGAN},
abstract = {The task of generating photorealistic images from their textual descriptions is quite challenging. Most existing tasks in this domain are focused on the generation of images such as flowers or birds from their textual description, especially for validating the generative models based on Generative Adversarial Network (GAN) variants and for recreational purposes. However, such work is limited in the domain of photorealistic face image generation and the results obtained have not been satisfactory. This is partly due to the absence of concrete data in this domain and a large number of highly specific features/attributes involved in face generation compared to birds or flowers. In this paper, we propose an Attention Generative Adversarial Network (AttnGAN) for a fine-grained text-to-face generation that enables attention-driven multi-stage refinement by employing Deep Attentional Multimodal Similarity Model (DAMSM). Through extensive experimentation on the CelebA dataset, we evaluated our approach using the Frechet Inception Distance (FID) score. The output files for the Face2Text Dataset are also compare with that of the T2F Github project. According to the visual comparison, AttnGAN generated higher-quality images than T2F. Additionally, we compare our methodology with existing approaches with a specific focus on CelebA dataset and demonstrate that our approach generates a better FID score facilitating more realistic image generation. The application of such an approach can be found in criminal identification, where faces are generated from the textual description from an eyewitness. Such a method can bring consistency and eliminate the individual biases of an artist drawing the faces from the description given by the eyewitness. Finally, we discuss the deplyment of the models on a Raspberry Pi to test how effective the models would be on a standalone device to facilitate portability and timely task completion.}
}
@article{Tao18attngan,
author = {Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He},
title = {AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks},
Year = {2018},
booktitle = {{CVPR}}
}
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research.
- StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [code]
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [code]