Text detection custom training not working #1956

gastodler · 2023-07-18T11:54:48Z

gastodler
Jul 18, 2023

Hi,

I'm trying to train text detector for Arabic text.
I've created annotation files in the same format as Icdar (attached a sample of my annotation).
in addition, I customized the config files (attached my log).

As you can see in the attached results sample, my training starts with loss=1.0 and not moving from there (I attached only the first epoch - but it goes on and on to 10/20/... more epochs).

I'll be more than happy for any help, because I can't find the reason for my bug.

annotation

{
"metainfo": {
"dataset_type": "TextDetDataset",
"task_name": "textdet",
"category": [
{
"id": 0,
"name": "text"
}
]
},
"data_list": [
{
"instances": [
{
"polygon": [
331,
674,
360,
674,
360,
691,
331,
691
],
"bbox": [
331,
674,
360,
690
],
"bbox_label": 0,
"ignore": "false"
}
],
"img_path": "/home/ubuntu/ocr/open_source_models/resources/textDetFiles/imgs/test/4000.jpg",
"height": 848,
"width": 464
}
]
}

log

System environment:
sys.platform: linux
Python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
CUDA available: True
numpy_random_seed: 1617739328
GPU 0: Tesla V100-SXM2-16GB
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 11.5, V11.5.119
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
PyTorch: 1.11.0+cu102
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.12.0+cu102
OpenCV: 4.7.0
MMEngine: 0.7.3

Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
Distributed launcher: none
Distributed training: False
GPU number: 1

2023/07/18 11:19:55 - mmengine - INFO - Config:
model = dict(
type='DBNet',
backbone=dict(
type='mmdet.ResNet',
depth=18,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=True),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
norm_eval=False,
style='caffe'),
neck=dict(
type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
det_head=dict(
type='DBHead',
in_channels=256,
module_loss=dict(type='DBModuleLoss'),
postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
data_preprocessor=dict(
type='TextDetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32))
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
]
params = dict(
icdar2015_textdet_data_root='/home/ubuntu/ocr/mmocr/data/icdar2015',
arabic_det_root=
'/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
arabic_flag=True,
arabic_train_ann='det_train.json',
arabic_test_ann='det_test_tiny.json',
icdar_train_ann='textdet_train.json',
icdar_test_ann='textdet_test.json')
arabic_flag = True
arabic_det_root = '/home/ubuntu/ocr/open_source_models/resources/textDetFiles'
icdar2015_textdet_data_root = '/home/ubuntu/ocr/mmocr/data/icdar2015'
arabic_train_ann = 'det_train.json'
arabic_test_ann = 'det_test_tiny.json'
icdar_train_ann = 'textdet_train.json'
icdar_test_ann = 'textdet_test.json'
data_root_path = '/home/ubuntu/ocr/open_source_models/resources/textDetFiles'
train_ann = 'det_train.json'
test_ann = 'det_test_tiny.json'
icdar2015_textdet_train = dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
])
icdar2015_textdet_test = dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
])
default_scope = 'mmocr'
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
randomness = dict(seed=None)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=5),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=1),
sampler_seed=dict(type='DistSamplerSeedHook'),
sync_buffer=dict(type='SyncBuffersHook'),
visualization=dict(
type='VisualizationHook',
interval=1,
enable=False,
show=False,
draw_gt=False,
draw_pred=False))
log_level = 'INFO'
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
load_from = None
resume = False
val_evaluator = dict(type='HmeanIOUMetric')
test_evaluator = dict(type='HmeanIOUMetric')
vis_backends = [
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')
]
visualizer = dict(
type='TextDetLocalVisualizer',
name='visualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')
])
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=400, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [dict(type='ConstantLR', factor=1.0)]
train_dataloader = dict(
batch_size=16,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]))
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
test_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
auto_scale_lr = dict(base_batch_size=16)
launcher = 'none'
work_dir = '/home/ubuntu/ocr/open_source_models/mmocr_arabic_det_train/work_dirs'

results samples

2023/07/18 11:20:09 - mmengine - INFO - Epoch(train) [1][ 5/51] lr: 7.0000e-03 eta: 7:25:44 time: 1.3113 data_time: 0.6076 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:11 - mmengine - INFO - Epoch(train) [1][10/51] lr: 7.0000e-03 eta: 5:01:38 time: 0.8876 data_time: 0.3346 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:13 - mmengine - INFO - Epoch(train) [1][15/51] lr: 7.0000e-03 eta: 4:15:39 time: 0.4731 data_time: 0.0489 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:16 - mmengine - INFO - Epoch(train) [1][20/51] lr: 7.0000e-03 eta: 3:52:07 time: 0.4792 data_time: 0.0403 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:18 - mmengine - INFO - Epoch(train) [1][25/51] lr: 7.0000e-03 eta: 3:41:20 time: 0.5007 data_time: 0.0674 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:20 - mmengine - INFO - Epoch(train) [1][30/51] lr: 7.0000e-03 eta: 3:27:15 time: 0.4646 data_time: 0.0683 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:22 - mmengine - INFO - Epoch(train) [1][35/51] lr: 7.0000e-03 eta: 3:17:38 time: 0.4086 data_time: 0.0473 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:24 - mmengine - INFO - Epoch(train) [1][40/51] lr: 7.0000e-03 eta: 3:09:20 time: 0.4005 data_time: 0.0450 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:26 - mmengine - INFO - Epoch(train) [1][45/51] lr: 7.0000e-03 eta: 3:02:37 time: 0.3844 data_time: 0.0389 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:28 - mmengine - INFO - Epoch(train) [1][50/51] lr: 7.0000e-03 eta: 2:56:51 time: 0.3754 data_time: 0.0371 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:28 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_icdar2015_20230718_111954
2023/07/18 11:20:31 - mmengine - INFO - Evaluating hmean-iou...
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.30, recall: 1.0000, precision: 0.0000, hmean: 0.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.40, recall: 1.0000, precision: 0.0000, hmean: 0.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.50, recall: 1.0000, precision: 0.0000, hmean: 0.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.60, recall: 1.0000, precision: 1.0000, hmean: 1.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.70, recall: 1.0000, precision: 1.0000, hmean: 1.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.80, recall: 1.0000, precision: 1.0000, hmean: 1.0000

2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.90, recall: 1.0000, precision: 1.0000, hmean: 1.0000

2023/07/18 11:20:31 - mmengine - INFO - Epoch(val) [1][1/1] icdar/precision: 1.0000 icdar/recall: 1.0000 icdar/hmean: 1.0000 data_time: 0.2443 time: 2.4554

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text detection custom training not working #1956

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Text detection custom training not working #1956

gastodler Jul 18, 2023

annotation

log

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: none Distributed training: False GPU number: 1

results samples

Replies: 0 comments

gastodler
Jul 18, 2023

Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
Distributed launcher: none
Distributed training: False
GPU number: 1