Final Project in "Deep Learning" course in Skotech, 2020.
Authors: Ilya Borovik, Bulat Khabibullin, Vladislav Kniazev, Oluwafemi Olaleke and Zakhar Pichugin
The repository presents multiple meme generation models (see illustrations below):
- Captioning LSTM with Image-only Encoder
- Captioning LSTM with Image-label Encoder
- Base Captioning Transformer with Global image embedding
- Captioning Transformer LSTM with Spatial image features
Observe the models in action in the demo notebook:
All pretrained models will be automatically downloaded and built in Colab runtime.
Except for the models, we collect and release a large-scale dataset of 900,000 meme templates crawled from MemeGenerator website. The dataset is uploaded to Google Drive. Description of the dataset is given in the corresponding section.
Note: Repository state at the end of "Deep Learning" course project is recorded in the branch skoltech-dl-project
.
The example code for training the models is provided in Colab notebook. It contains the training progress and TensorBoard logs for all experiments described in the project report.
We crawl and preprocess a large-scale meme dataset consisting of 900,000 meme captions for 300 meme template images collected from MemeGenerator website. During the data collection we clean the data from evident duplicates, long caption outliers, non-ASCII symbols and non-English templates.
Crawled dataset of 300 meme templates with 3000 captions per templates can be downloaded
using load_data.sh
script or directly from Google Drive. The data is split into train/val/test
with 2500/250/250 captions per split for each template. We provide the data splits to make the comparison of new models with our works possible.
The dataset archive follows the following format:
├── memes900k
| ├── images -- template images
| ├── cool-dog.jpg
| ├── dogeee.jpg
| ├── ...
| ├── tempaltes.txt -- template labels and image urls
| ├── captions.txt -- all captions
| ├── captions_train.txt -- training split
| ├── captions_val.txt -- validation split
| ├── captions_test.txt -- test split
To crawl own dataset, run the following script:
python crawl_data.py --source memegenerator.net --save-dir ../memes \
--poolsize 25 --num-templates 300 --num-captions 3000 \
--detect-english --detect-duplicates \
--min-len 10 --max-len 96 --max-tokens 31
Then, split the data into train/val/test
using:
python split_data.py --data-dir ../memes --splits 2500 250 250