Official PyTorch implementation of DiG

This repository is built upon MoCo V3 and MAE, thanks very much!

Data preparation

All datasets for pre-training and fine-tuning are processed from public datasets.

Unlabeled Real Data	CC-OCR
Synthetic Text Data	SynthText, Synth90k (Baiduyun with passwd: wi05)
Annotated Real Data	TextOCR, OpenImageTextV5
Scene Text Recognition Benchmarks	IIIT5k, SVT, IC13, IC15, SVTP, CUTE, COCOText, CTW, Total-Text, HOST, WOST
Handwritten Text Training Data	CVL, IAM
Handwritten Text Recognition Benchmarks	CVL, IAM

Setup

conda env create -f environment.yml

Run

Pre-training

# Set the path to save checkpoints
OUTPUT_DIR='output/pretrain_dig'
# path to imagenet-1k train set
DATA_PATH='/path/to/pretrain_data/'


# batch_size can be adjusted according to the graphics card
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining_moco.py \
        --image_alone_path ${DATA_PATH} \
        --mask_ratio 0.7 \
        --batch_size 128 \
        --opt adamw \
        --output_dir ${OUTPUT_DIR} \
        --epochs 10 \
        --warmup_steps 5000 \
        --max_len 25 \
        --num_view 2 \
        --moco_dim 256 \
        --moco_mlp_dim 4096 \
        --moco_m 0.99 \
        --moco_m_cos \
        --moco_t 0.2 \
        --num_windows 4 \
        --contrast_warmup_steps 0 \
        --contrast_start_epoch 0 \
        --loss_weight_pixel 1. \
        --loss_weight_contrast 0.1 \
        --only_mim_on_ori_img \
        --weight_decay 0.1 \
        --opt_betas 0.9 0.999 \
        --model pretrain_simmim_moco_ori_vit_small_patch4_32x128 \
        --patchnet_name no_patchtrans \
        --encoder_type vit \

Fine-tuning

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k set
DATA_PATH='/path/to/finetune_data'
# path to pretrain model
MODEL_PATH='/path/to/pretrain/checkpoint.pth'

# batch_size can be adjusted according to the graphics card
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 --master_port 10041 run_class_finetuning.py \
    --model simmim_vit_small_patch4_32x128 \
    --data_path ${DATA_PATH} \
    --eval_data_path ${DATA_PATH} \
    --finetune ${MODEL_PATH} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 256 \
    --opt adamw \
    --opt_betas 0.9 0.999 \
    --weight_decay 0.05 \
    --data_set image_lmdb \
    --nb_classes 97 \
    --smoothing 0. \
    --max_len 25 \
    --epochs 10 \
    --warmup_epochs 1 \
    --drop 0.1 \
    --attn_drop_rate 0.1 \
    --drop_path 0.1 \
    --dist_eval \
    --lr 1e-4 \
    --num_samples 1 \
    --fixed_encoder_layers 0 \
    --decoder_name tf_decoder \
    --use_abi_aug \
    --num_view 2 \

Evaluation

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k set
DATA_PATH='/path/to/test_data'
# path to finetune model
MODEL_PATH='/path/to/finetune/checkpoint.pth'

# batch_size can be adjusted according to the graphics card
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=$opt_nproc_per_node --master_port 10040 run_class_finetuning.py \
    --model simmim_vit_small_patch4_32x128 \
    --data_path ${DATA_PATH} \
    --eval_data_path ${DATA_PATH} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 512 \
    --opt adamw \
    --opt_betas 0.9 0.999 \
    --weight_decay 0.05 \
    --data_set image_lmdb \
    --nb_classes 97 \
    --smoothing 0. \
    --max_len 25 \
    --resume ${MODEL_PATH} \
    --eval \
    --epochs 20 \
    --warmup_epochs 2 \
    --drop 0.1 \
    --attn_drop_rate 0.1 \
    --dist_eval \
    --num_samples 1000000 \
    --fixed_encoder_layers 0 \
    --decoder_name tf_decoder \
    --beam_width 0 \

Result

model	pretrain	finetune	average accuracy	weight
vit-small	10e	10e	85.21%	pretrain finetune

Citation

If you find this project helpful for your research, please cite the following paper:

@inproceedings{DiG,
  author    = {Mingkun Yang and
               Minghui Liao and
               Pu Lu and
               Jing Wang and
               Shenggao Zhu and
               Hualin Luo and
               Qi Tian and
               Xiang Bai},
  editor    = {Jo{\~{a}}o Magalh{\~{a}}es and
               Alberto Del Bimbo and
               Shin'ichi Satoh and
               Nicu Sebe and
               Xavier Alameda{-}Pineda and
               Qin Jin and
               Vincent Oria and
               Laura Toni},
  title     = {Reading and Writing: Discriminative and Generative Modeling for Self-Supervised
               Text Recognition},
  booktitle = {{MM} '22: The 30th {ACM} International Conference on Multimedia, Lisboa,
               Portugal, October 10 - 14, 2022},
  pages     = {4214--4223},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3503161.3547784},
  doi       = {10.1145/3503161.3547784},
  timestamp = {Fri, 14 Oct 2022 14:25:06 +0200},
  biburl    = {https://dblp.org/rec/conf/mm/YangLLWZLTB22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details. If you are going to use it in a product, we suggest you contact us regarding possible patent issues.

Name	Name	Last commit message	Last commit date
Latest commit ayumiymk Merge branch 'master' of https://github.com/ayumiymk/DiG Feb 27, 2023 10b6fa3 · Feb 27, 2023 History 9 Commits
custom_optim	custom_optim	initial commit	Dec 16, 2022
dataset	dataset	Merge branch 'master' of https://github.com/ayumiymk/DiG	Feb 27, 2023
evaluation_metric	evaluation_metric	initial commit	Dec 16, 2022
loss	loss	initial commit	Dec 16, 2022
models	models	initial commit	Dec 16, 2022
output	output	initial commit	Dec 16, 2022
tools	tools	initial commit	Dec 16, 2022
utils	utils	fix bugs of fine-tune and evaluation	Feb 27, 2023
.gitignore	.gitignore	initial commit	Dec 16, 2022
LICENSE	LICENSE	Update LICENSE	Dec 22, 2022
README.md	README.md	Update README.md	Dec 22, 2022
engine_for_finetuning.py	engine_for_finetuning.py	add missing files	Feb 15, 2023
engine_for_pretraining_moco.py	engine_for_pretraining_moco.py	initial commit	Dec 16, 2022
environment.yml	environment.yml	initial commit	Dec 16, 2022
masking_generator.py	masking_generator.py	initial commit	Dec 16, 2022
modeling_finetune.py	modeling_finetune.py	initial commit	Dec 16, 2022
modeling_pretrain_moco_mim_ori.py	modeling_pretrain_moco_mim_ori.py	initial commit	Dec 16, 2022
modeling_pretrain_vit.py	modeling_pretrain_vit.py	fix bugs of fine-tune and evaluation	Feb 27, 2023
optim_factory.py	optim_factory.py	initial commit	Dec 16, 2022
run_class_finetuning.py	run_class_finetuning.py	fix bugs of fine-tune and evaluation	Feb 27, 2023
run_mae_pretraining_moco.py	run_mae_pretraining_moco.py	initial commit	Dec 16, 2022
run_mae_pretraining_moco_multiMachine.py	run_mae_pretraining_moco_multiMachine.py	initial commit	Dec 16, 2022
transforms.py	transforms.py	initial commit	Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official PyTorch implementation of DiG

Data preparation

Setup

Run

Result

Citation

License

About

Releases

Packages

Languages

License

ayumiymk/DiG

Folders and files

Latest commit

History

Repository files navigation

Official PyTorch implementation of DiG

Data preparation

Setup

Run

Result

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages