You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to train text detector for Arabic text.
I've created annotation files in the same format as Icdar (attached a sample of my annotation).
in addition, I customized the config files (attached my log).
As you can see in the attached results sample, my training starts with loss=1.0 and not moving from there (I attached only the first epoch - but it goes on and on to 10/20/... more epochs).
I'll be more than happy for any help, because I can't find the reason for my bug.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I'm trying to train text detector for Arabic text.
I've created annotation files in the same format as Icdar (attached a sample of my annotation).
in addition, I customized the config files (attached my log).
As you can see in the attached results sample, my training starts with loss=1.0 and not moving from there (I attached only the first epoch - but it goes on and on to 10/20/... more epochs).
I'll be more than happy for any help, because I can't find the reason for my bug.
annotation
{
"metainfo": {
"dataset_type": "TextDetDataset",
"task_name": "textdet",
"category": [
{
"id": 0,
"name": "text"
}
]
},
"data_list": [
{
"instances": [
{
"polygon": [
331,
674,
360,
674,
360,
691,
331,
691
],
"bbox": [
331,
674,
360,
690
],
"bbox_label": 0,
"ignore": "false"
}
],
"img_path": "/home/ubuntu/ocr/open_source_models/resources/textDetFiles/imgs/test/4000.jpg",
"height": 848,
"width": 464
}
]
}
log
System environment:
sys.platform: linux
Python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
CUDA available: True
numpy_random_seed: 1617739328
GPU 0: Tesla V100-SXM2-16GB
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 11.5, V11.5.119
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
PyTorch: 1.11.0+cu102
PyTorch compiling details: PyTorch built with:
GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0+cu102
OpenCV: 4.7.0
MMEngine: 0.7.3
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
Distributed launcher: none
Distributed training: False
GPU number: 1
2023/07/18 11:19:55 - mmengine - INFO - Config:
model = dict(
type='DBNet',
backbone=dict(
type='mmdet.ResNet',
depth=18,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=True),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
norm_eval=False,
style='caffe'),
neck=dict(
type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
det_head=dict(
type='DBHead',
in_channels=256,
module_loss=dict(type='DBModuleLoss'),
postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
data_preprocessor=dict(
type='TextDetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32))
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
]
params = dict(
icdar2015_textdet_data_root='/home/ubuntu/ocr/mmocr/data/icdar2015',
arabic_det_root=
'/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
arabic_flag=True,
arabic_train_ann='det_train.json',
arabic_test_ann='det_test_tiny.json',
icdar_train_ann='textdet_train.json',
icdar_test_ann='textdet_test.json')
arabic_flag = True
arabic_det_root = '/home/ubuntu/ocr/open_source_models/resources/textDetFiles'
icdar2015_textdet_data_root = '/home/ubuntu/ocr/mmocr/data/icdar2015'
arabic_train_ann = 'det_train.json'
arabic_test_ann = 'det_test_tiny.json'
icdar_train_ann = 'textdet_train.json'
icdar_test_ann = 'textdet_test.json'
data_root_path = '/home/ubuntu/ocr/open_source_models/resources/textDetFiles'
train_ann = 'det_train.json'
test_ann = 'det_test_tiny.json'
icdar2015_textdet_train = dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
])
icdar2015_textdet_test = dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
])
default_scope = 'mmocr'
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
randomness = dict(seed=None)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=5),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=1),
sampler_seed=dict(type='DistSamplerSeedHook'),
sync_buffer=dict(type='SyncBuffersHook'),
visualization=dict(
type='VisualizationHook',
interval=1,
enable=False,
show=False,
draw_gt=False,
draw_pred=False))
log_level = 'INFO'
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
load_from = None
resume = False
val_evaluator = dict(type='HmeanIOUMetric')
test_evaluator = dict(type='HmeanIOUMetric')
vis_backends = [
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')
]
visualizer = dict(
type='TextDetLocalVisualizer',
name='visualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')
])
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=400, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [dict(type='ConstantLR', factor=1.0)]
train_dataloader = dict(
batch_size=16,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]))
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
test_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='/home/ubuntu/ocr/open_source_models/resources/textDetFiles',
ann_file='det_test_tiny.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
auto_scale_lr = dict(base_batch_size=16)
launcher = 'none'
work_dir = '/home/ubuntu/ocr/open_source_models/mmocr_arabic_det_train/work_dirs'
results samples
2023/07/18 11:20:09 - mmengine - INFO - Epoch(train) [1][ 5/51] lr: 7.0000e-03 eta: 7:25:44 time: 1.3113 data_time: 0.6076 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:11 - mmengine - INFO - Epoch(train) [1][10/51] lr: 7.0000e-03 eta: 5:01:38 time: 0.8876 data_time: 0.3346 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:13 - mmengine - INFO - Epoch(train) [1][15/51] lr: 7.0000e-03 eta: 4:15:39 time: 0.4731 data_time: 0.0489 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:16 - mmengine - INFO - Epoch(train) [1][20/51] lr: 7.0000e-03 eta: 3:52:07 time: 0.4792 data_time: 0.0403 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:18 - mmengine - INFO - Epoch(train) [1][25/51] lr: 7.0000e-03 eta: 3:41:20 time: 0.5007 data_time: 0.0674 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:20 - mmengine - INFO - Epoch(train) [1][30/51] lr: 7.0000e-03 eta: 3:27:15 time: 0.4646 data_time: 0.0683 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:22 - mmengine - INFO - Epoch(train) [1][35/51] lr: 7.0000e-03 eta: 3:17:38 time: 0.4086 data_time: 0.0473 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:24 - mmengine - INFO - Epoch(train) [1][40/51] lr: 7.0000e-03 eta: 3:09:20 time: 0.4005 data_time: 0.0450 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:26 - mmengine - INFO - Epoch(train) [1][45/51] lr: 7.0000e-03 eta: 3:02:37 time: 0.3844 data_time: 0.0389 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:28 - mmengine - INFO - Epoch(train) [1][50/51] lr: 7.0000e-03 eta: 2:56:51 time: 0.3754 data_time: 0.0371 memory: 6727 loss: 1.0000 loss_prob: 0.0000 loss_thr: 0.0000 loss_db: 1.0000
2023/07/18 11:20:28 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_icdar2015_20230718_111954
2023/07/18 11:20:31 - mmengine - INFO - Evaluating hmean-iou...
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.30, recall: 1.0000, precision: 0.0000, hmean: 0.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.40, recall: 1.0000, precision: 0.0000, hmean: 0.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.50, recall: 1.0000, precision: 0.0000, hmean: 0.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.60, recall: 1.0000, precision: 1.0000, hmean: 1.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.70, recall: 1.0000, precision: 1.0000, hmean: 1.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.80, recall: 1.0000, precision: 1.0000, hmean: 1.0000
2023/07/18 11:20:31 - mmengine - INFO - prediction score threshold: 0.90, recall: 1.0000, precision: 1.0000, hmean: 1.0000
2023/07/18 11:20:31 - mmengine - INFO - Epoch(val) [1][1/1] icdar/precision: 1.0000 icdar/recall: 1.0000 icdar/hmean: 1.0000 data_time: 0.2443 time: 2.4554
Beta Was this translation helpful? Give feedback.
All reactions