It started as code for the paper:
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition (Accepted by ICCV 2023)
This project is a toolkit for the novel scenario of Incremental Multilingual Text Recognition (IMLTR), the project supports many incremental learning methods and proposes a more applicable method for IMLTR: Multiplexed Routing Network (MRN) and the corresponding dataset. The project provides an efficient framework to assist in developing new methods and analyzing existing ones under the IMLTR task, and we hope it will advance the IMLTR community.
- Base: Baseline method which simply updates parameters on new tasks.
- Joint: Bound method: data for all tasks are trained at once, an upper bound for the method
(Joint_mix means all tasks data mixed in batch, Joint_loader means the consistent proportion of data from each task in a batch) - EWC
[PNAS2017]
: Overcoming catastrophic forgetting in neural networks - LwF
[ECCV2016]
: Learning without Forgetting - WA
[CVPR2020]
: Maintaining Discrimination and Fairness in Class Incremental Learning - DER
[CVPR2021]
: DER: Dynamically Expandable Representation for Class Incremental Learning - MRN
[ICCV2023]
: MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
you can change config config/crnn_mrn.py
for different il methods or setting.
common=dict(
il="mrn", # joint_mix | joint_loader | base | lwf | wa | ewc | der | mrn
memory="random", # None | random
memory_num=2000,
start_task = 0 # checkpoint start
)
- CRNN
[TPAMI2017]
: An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition - TRBA
[ICCV2019]
: What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis - SVTR
[IJCAI2022]
: SVTR: Scene Text Recognition with a Single Visual Model
you can change config config/crnn_mrn.py
for different text recognition modules or setting.
""" Model Architecture """
common=dict(
batch_max_length = 25,
imgH = 32,
imgW = 256,
)
model=dict(
model_name="TRBA",
Transformation = "TPS", #None TPS
FeatureExtraction = "ResNet", #VGG ResNet SVTR
SequenceModeling = "BiLSTM", #None BiLSTM
Prediction = "Attn", #CTC Attn
num_fiducial=20,
input_channel=4,
output_channel=512,
hidden_size=256,
)
The Dataset can be downloaded from BaiduNetdisk(passwd:c07h).
dataset
├── MLT17_IL
│ ├── test_2017
│ ├── train_2017
├── MLT19_IL
│ ├── test_2019
│ ├── train_2019
Incremental MLT17: MLT17 has 68,613 training instances and 16,255 validation instances, which are from 6 scripts and 9 languages: Chinese, Japanese, Korean, Bangla, Arabic, Italian, English, French, and German. The last four use Latin script. Incremental MLT17 use the validation set for test due to the unavailability of test data. Tasks are split by scripts and modeled sequentially. Special symbols are discarded at the preprocessing step as with no linguistic meaning.
Incremental MLT19: MLT19 has 89,177 text instances coming from 7 scripts. Since the inaccessibility of test set, Incremental MLT19 randomly split the training instances to 9:1 script-by-script, for model training and test. To be consistent with Incremental MLT17 dataset, we discard the Hindi script and also special symbols. Statistics of the two datasets are shown in the following.
Dataset | Categories | ||||||
---|---|---|---|---|---|---|---|
Task1 | Task2 | Task3 | Task4 | Task5 | Task6 | ||
Chinese | Latin | Japanese | Korean | Arabic | Bangla | ||
MLT171 | Train Instance | 2687 | 47411 | 4609 | 5631 | 3711 | 3237 |
Test Instance | 529 | 11073 | 1350 | 1230 | 983 | 713 | |
Train Class | 1895 | 325 | 1620 | 1124 | 73 | 112 | |
MLT192 | Train Instance | 2897 | 52921 | 5324 | 6107 | 4230 | 3542 |
Test Instance | 322 | 5882 | 590 | 679 | 470 | 393 | |
Train Class | 2086 | 220 | 1728 | 1160 | 73 | 102 |
- This work was tested with PyTorch 1.6.0, CUDA 10.1 and python 3.6.
conda create -n mrn python=3.7 -y
conda activate mrn
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
- requirements :
pip3 install lmdb pillow torchvision nltk natsort fire tensorboard tqdm opencv-python einops timm mmcv shapely scipy
pip3 install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.1/index.html
python3 tiny_train.py --config=config/crnn_mrn.py --exp_name CRNN_real
tiny_train.py (as a default, evaluate trained model on IMLTR datasets at the end of training.
--select_data
: folder path to training lmdb datasets.
[" ../dataset/MLT17_IL/train_2017", "../dataset/MLT19_IL/train_2019"]
--valid_datas
: folder path to testing lmdb dataset.
[" ../dataset/MLT17_IL/test_2017", "../dataset/MLT19_IL/test_2019"]
--batch_ratio
: assign ratio for each selected data in the batch. default is '1 / number of datasets'.--Aug
: whether to use augmentation |None|Blur|Crop|Rot|
For detailed configuration modifications please use the config file config/crnn_mrn.py
common=dict(
exp_name="TRBA_MRN", # Where to store logs and models
il="mrn", # joint_mix | joint_loader | base | lwf | wa | ewc | der | mrn
memory="random", # None | random
memory_num=2000,
batch_max_length = 25,
imgH = 32,
imgW = 256,
manual_seed=111,
start_task = 0
)
""" Model Architecture """
model=dict(
model_name="TRBA",
Transformation = "TPS", #None TPS
FeatureExtraction = "ResNet", #VGG ResNet
SequenceModeling = "BiLSTM", #None BiLSTM
Prediction = "Attn", #CTC Attn
num_fiducial=20,
input_channel=4,
output_channel=512,
hidden_size=256,
)
""" Optimizer """
optimizer=dict(
schedule="super", #default is super for super convergence, 1 for None, [0.6, 0.8] for the same setting with ASTER
optimizer="adam",
lr=0.0005,
sgd_momentum=0.9,
sgd_weight_decay=0.000001,
milestones=[2000,4000],
lrate_decay=0.1,
rho=0.95,
eps=1e-8,
lr_drop_rate=0.1
)
""" Data processing """
train = dict(
saved_model="", # "path to model to continue training"
Aug="None", # |None|Blur|Crop|Rot|ABINet
workers=4,
lan_list=["Chinese","Latin","Japanese", "Korean", "Arabic", "Bangla"],
valid_datas=[
"../dataset/MLT17_IL/test_2017",
"../dataset/MLT19_IL/test_2019"
],
select_data=[
"../dataset/MLT17_IL/train_2017",
"../dataset/MLT19_IL/train_2019"
],
batch_ratio="0.5-0.5",
total_data_usage_ratio="1.0",
NED=True,
batch_size=256,
num_iter=10000,
val_interval=5000,
log_multiple_test=None,
grad_clip=5,
)
The experimental results of each task are recorded in data_any.txt
and can be used for analysis of the data.
This implementation has been based on these repositories:
Please consider citing this work in your publications if it helps your research.
@article{zheng2023mrn,
title={MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition},
author={Zheng, Tianlun and Chen, Zhineng and Huang, BingChen and Zhang, Wei and Jiang, Yu-Gang},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
This project is released under the Apache 2.0 license.