Show, Attend and Read

This is the code for the paper "Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition", Hui Li*, Peng Wang*, Chunhua Shen, Guyu Zhang (* indicates equal contribution) accepted to AAAI-19

Installation

The model is implemented in Torch, and has been tested under Ubuntu 14.04, with CUDA 8.0 and CUDNN 7.0. It depends on the following packages: torch/torch7, torch/nn, torch/nngraph, torch/image, lua-cjson, which can be easily install by "luarocks install **". CUDA-enabled GPUs are required. In addition, LMDB is required which can be installed by "apt-get install liblmdb-dev" and "pip install lmdb" in Ubuntu.

Pretrained Model

The pretrained model is localated in https://pan.baidu.com/s/1Z4a0l6UNhuWY3BDy8Z4Ctg because of the space limitation. Download it and put it into the "saved_model" folder.

Run the model

To run the model on a new image or image directory, use the script "run_model.lua".

To run the pretrained model on a provided image, use the '-input_image' flag, for example, th run_model.lua -input_image data/beach.jpg

To test the model on an entire directory of images, use the '-input_dir' flag instead: th run_model.lua -input_dir /path/to/my/image/folder

The results will be wroten into the folder vis/data.

Model training

To train the model, follow the following steps:

Prepare the training data, including the public available synthetic data:

Syn90k (http://www.robots.ox.ac.uk/~vgg/data/text/)
SynthText (http://www.robots.ox.ac.uk/~vgg/data/scenetext/)
SynthAdd (https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg  (code:627x))

and public available real image datasets:

IIIT5K (http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
SVT (http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
ICDAR2013, ICDAR2015, and COCO-Text (http://rrc.cvc.uab.es/?com=introduction)

Use the script "create_dataset.py" to generate a group of "data.mdb" files which contain both synthetic and real data. The generated "data.mdb" will be saved under "DataDB" folder. To use create_dataset.py, the training images and their labels should be placed in the imagePathDir and a 'txt' labelfile separately.
Run the script "th main_train.lua" to train the model. The model will be saved regularly under the folder "saved_model".

Citation

Please cite the following paper if you are using the code/model in your research paper.

@InProceedings{SAR_aaai19, author = {Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang}, title = {Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition}, booktitle ={AAAI Conference on Artificial Intelligence}, year = {2019} }

License

This code is only for academic purpose. For commercial purpose, please contact us (peng.wang@nwpu.edu.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
DataDB/token_dir		DataDB/token_dir
data		data
third_party/lmdb-lua-ffi		third_party/lmdb-lua-ffi
DatasetLmdb.lua		DatasetLmdb.lua
MainModel_recog.lua		MainModel_recog.lua
README.md		README.md
TemporalCrossEntropyCriterion.lua		TemporalCrossEntropyCriterion.lua
create_dataset.py		create_dataset.py
main_train.lua		main_train.lua
maskSoftMax.lua		maskSoftMax.lua
net_utils.lua		net_utils.lua
optim_updates.lua		optim_updates.lua
recognition_net.lua		recognition_net.lua
run_model.lua		run_model.lua
train_opts.lua		train_opts.lua
utilities.lua		utilities.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show, Attend and Read

Installation

Pretrained Model

Run the model

Model training

Citation

License

About

Releases

Packages

Languages

wangpengnorman/SAR-Strong-Baseline-for-Text-Recognition

Folders and files

Latest commit

History

Repository files navigation

Show, Attend and Read

Installation

Pretrained Model

Run the model

Model training

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages