This document takes configs/rec/crnn/crnn_icdar15.yaml
as an example to describe the usage of parameters in detail.
Parameter | Description | Default | Optional Values | Remarks |
---|---|---|---|---|
mode | Mindspore running mode (static graph/dynamic graph) | 0 | 0 / 1 | 0: means running in GRAPH_MODE mode; 1: PYNATIVE_MODE mode |
distribute | Whether to enable parallel training | True | True / False | \ |
device_id | Specify the device id while standalone training | 7 | The ids of all devices in the server | Only valid when distribute=False (standalone training) and environment variable 'DEVICE_ID' is NOT set. While standalone training, if both this arg and environment variable 'DEVICE_ID' are NOT set, use device 0 by default. |
amp_level | Mixed precision mode | O0 | O0/O1/O2/O3 | 'O0' - no change. 'O1' - convert the cells and operations in the whitelist to float16 precision, and keep the rest in float32 precision. 'O2' - Keep the cells and operations in the blacklist with float32 precision, and convert the rest to float16 precision. 'O3' - Convert all networks to float16 precision. |
seed | Random seed | 42 | Integer | \ |
ckpt_save_policy | The policy for saving model weights | top_k | "top_k" or "latest_k" | "top_k" means to keep the top k checkpoints according to the metric score; "latest_k" means to keep the last k checkpoints. The value of k is set via ckpt_max_keep |
ckpt_max_keep | The maximum number of checkpoints to keep during training | 5 | Integer | \ |
log_interval | The interval of printing logs (unit: epoch) | 100 | Integer | \ |
val_while_train | Whether to enable the evaluation mode while training | True | True/False | If the value is True, please configure the eval data set synchronously |
val_start_epoch | From which epoch to run the evaluation | 1 | Interger | |
val_interval | Evaluation interval (unit: epoch) | 1 | Interger | |
drop_overflow_update | Whether not updating network parameters when loss/gradient overflows | True | True/False | If value is true, network parameters will not be updated when overflow occurs |
Because the same parameter may need to be reused in different configuration sections, you can customize some common parameters in this section for easy management.
In MindOCR, the network architecture of the model is divided into four modules: Transform, Backbone, Neck and Head. For details, please refer to documentation, the following are the configuration instructions and examples of each module.
Parameter | Description | Default | Remarks |
---|---|---|---|
type | Network type | - | Currently supports rec/det; 'rec' means recognition task, 'det' means detection task |
pretrained | Specify pre-trained weight path or url | null | Supports local checkpoint path or url |
transform: | Transformation method configuration | null | |
name | Specify transformation method name | - | Currently supports STN_ON |
backbone: | Backbone network configuration | ||
name | Specify the backbone network class name or function name | - | Currently defined classes include rec_resnet34, rec_vgg7, SVTRNet and det_resnet18, det_resnet50, det_resnet152, det_mobilenet_v3. You can also customize new classes, please refer to the documentation for definition. |
pretrained | Whether to load pre-trained backbone weights | False | Supports bool type or str type to be passed in. If it is True, the default weight will be downloaded and loaded through the url link defined in the backbone py file. If str is passed in, the local checkpoint path or url path can be specified for loading. |
neck: | Network Neck configuration | ||
name | Neck class name | - | Currently defined classes include RNNEncoder, DBFPN, EASTFPN and PSEFPN. New classes can also be customized, please refer to the documentation for definition. |
hidden_size | RNN hidden layer unit number | - | \ |
head: | Network prediction header configuration | ||
name | Head class name | - | Currently supports CTCHead, AttentionHead, DBHead, EASTHead and PSEHead |
weight_init | Set weight initialization | 'normal' | \ |
bias_init | Set bias initialization | 'zeros' | \ |
out_channels | Set the number of classes | - | \ |
Note: For different networks, the configurable parameters of the backbone/neck/head module will be different. The specific configurable parameters are determined by the init input parameter of the class specified by the
name
parameter of the module in the above table (For example, assume you specify the neck module is DBFPN. Since the DBFPN class initialization includes adaptive input parameters, parameters such as adaptive can be configured under the model.head in yaml.)
Reference example: DBNet, CRNN
Please see the code in mindocr/postprocess
Parameter | Description | Example | Remarks |
---|---|---|---|
name | Post-processing class name | - | Currently supports DBPostprocess, EASTPostprocess, PSEPostprocess, RecCTCLabelDecode and RecAttnLabelDecode |
character_dict_path | Recognition dictionary path | None | If None, then use the default dictionary [0-9a-z] |
use_space_char | Set whether to add spaces to the dictionary | False | True/False |
Note: For different post-processing methods (specified by name), the configurable parameters are different, and are determined by the input parameters of the initialization method
__init__
of the post-processing class.
Reference example: DBNet, PSENet
Please see the code in mindocr/metrics
Parameter | Description | Default | Remarks |
---|---|---|---|
name | Metric class name | - | Currently supports RecMetric, DetMetric |
main_indicator | Main indicator, used for comparison of optimal models | 'hmean' | 'acc' for recognition tasks, 'f-score' for detection tasks |
character_dict_path | Recognition dictionary path | None | If None, then use the default dictionary "0123456789abcdefghijklmnopqrstuvwxyz" |
ignore_space | Whether to filter spaces | True | True/False |
print_flag | Whether to print log | False | If set True, then output information such as prediction results and standard answers |
Please see the code in mindocr/losses
Parameter | Description | Default | Remarks |
---|---|---|---|
name | loss function name | - | Currently supports DBLoss, CTCLoss, AttentionLoss, PSEDiceLoss, EASTLoss and CrossEntropySmooth |
pred_seq_len | length of predicted text | 26 | Determined by network architecture |
max_label_len | The longest label length | 25 | The value is less than the length of the text predicted by the network |
batch_size | single card batch size | 32 | \ |
Note: For different loss functions (specified by name), the configurable parameters are different and determined by the input parameters of the selected loss function.
Please see the code in mindocr/scheduler
Parameter | Description | Default | Remarks |
---|---|---|---|
scheduler | Learning rate scheduler name | 'constant' | Currently supports 'constant', 'cosine_decay', 'step_decay', 'exponential_decay', 'polynomial_decay', 'multi_step_decay' |
min_lr | Minimum learning rate | 1e-6 | Lower lr bound for 'cosine_decay' schedulers. |
lr | Learning rate value | 0.01 | |
num_epochs | Number of total epochs | 200 | The number of total epochs for the entire training. |
warmup_epochs | The number of epochs in the training learning rate warmp phase | 3 | For 'cosine_decay', 'warmup_epochs' indicates the epochs to warmup learning rate from 0 to lr . |
decay_epochs | The number of epochs in the training learning rate decay phase | 10 | For 'cosine_decay' schedulers, decay LR to min_lr in decay_epochs . For 'step_decay' scheduler, decay LR by a factor of decay_rate every decay_epochs . |
Please see the code location: mindocr/optim
Parameter | Description | Default | Remarks |
---|---|---|---|
opt | Optimizer name | 'adam' | Currently supports 'sgd', 'nesterov', 'momentum', 'adam', 'adamw', 'lion', 'nadam', 'adan', 'rmsprop', 'adagrad', 'lamb'. |
filter_bias_and_bn | Set whether to exclude the weight decrement of bias and batch norm | True | If True, weight decay will not apply on BN parameters and bias in Conv or Dense layers. |
momentum | momentum | 0.9 | \ |
weight_decay | weight decay rate | 0 | It should be noted that weight decay can be a constant value or a Cell. It is a Cell only when dynamic weight decay is applied. Dynamic weight decay is similar to dynamic learning rate, users need to customize a weight decay schedule only with global step as input, and during training, the optimizer calls the instance of WeightDecaySchedule to get the weight decay value of current step. |
nesterov | Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. | False | True/False |
Parameter | Description | Default | Remarks |
---|---|---|---|
type | Loss scaling method type | static | Currently supports static, dynamic |
loss_scale | Loss scaling value | 1.0 | \ |
scale_factor | When using dynamic loss scaler, the coefficient to dynamically adjust the loss_scale | 2.0 | At each training step, the loss scaling value is updated to loss_scale /scale_factor when overflow occurs. |
scale_window | When using the dynamic loss scaler, when there is no overflow after the scale_window training step, enlarge the loss_scale by scale_factor times | 1000 | If the continuous scale_window steps does not overflow, the loss will be increased by loss_scale * scale_factor to update the scaling number |
The configuration of the training process is placed under train
, and the configuration of the evaluation phase is placed under eval
. Note that during model training, if the training-while-evaluation mode is turned on, that is, when val_while_train=True, an evaluation will be run according to the configuration under eval
after each epoch is trained. During the non-training phase, only the eval
configuration is read when only running model evaluation.
Parameter | Description | Default | Remarks |
---|---|---|---|
ckpt_save_dir | Set model save path | ./tmp_rec | \ |
resume | Resume training after training is interrupted, you can set True/False, or specify the ckpt path that needs to be loaded to resume training | False | If True, load resume_train.ckpt under the ckpt_save_dir directory to continue training. You can also specify the ckpt file path to load and resume training. |
dataset_sink_mode | Whether the data is directly sinked to the processor for processing | - | If set to True, the data sinks to the processor, and the data can be returned at least after the end of each epoch |
gradient_accumulation_steps | Number of steps to accumulate the gradients | 1 | Each step represents a forward calculation, and a reverse correction is performed after the gradient accumulation is completed. |
clip_grad | Whether to clip the gradient | False | If set to True, gradients are clipped to clip_norm |
clip_norm | The norm of clipping gradient if set clip_grad as True | 1 | \ |
ema | Whether to use EMA algorithm | False | \ |
ema_decay | EMA decay rate | 0.9999 | \ |
pred_cast_fp32 | Whether to cast the data type of logits to fp32 | False | \ |
dataset | Dataset configuration | For details, please refer to Data document | |
type | Dataset class name | - | Currently supports LMDBDataset, RecDataset and DetDataset |
dataset_root | The root directory of the dataset | None | Optional |
data_dir | The subdirectory where the dataset is located | - | If dataset_root is not set, please set this to the full directory |
label_file | The label file path of the dataset | - | If dataset_root is not set, please set this to the full path, otherwise just set the subpath |
sample_ratio | Data set sampling ratio | 1.0 | If value < 1.0, random selection |
shuffle | Whether to shuffle the data order | True if undering training, otherwise False | True/False |
transform_pipeline | Data processing flow | None | For details, please see transforms |
output_columns | Data loader (data loader) needs to output a list of data attribute names (given to the network/loss calculation/post-processing) (type: list), and the candidate data attribute names are determined by transform_pipeline. | None | If the value is None, all columns are output. Take crnn as an example, output_columns: ['image', 'text_seq'] |
net_input_column_index | In output_columns, the indices of the input items required by the network construct function | [0] | \ |
label_column_index | In output_columns, the indices of the input items required by the loss function | [1] | \ |
loader | Data Loading Settings | ||
shuffle | Whether to shuffle the data order for each epoch | True if undering training, otherwise False | True/False |
batch_size | Batch size of a single card | - | \ |
drop_remainder | Whether to drop the last batch of data when the total data cannot be divided by batch_size | True if undering training, otherwise False | \ |
max_rowsize | Specifies the maximum space allocated by shared memory when copying data between multiple processes | 64 | Default value: 64 |
num_workers | Specifies the number of concurrent processes/threads for batch operations | n_cpus / n_devices - 2 | This value should be greater than or equal to 2 |
Reference example: DBNet, CRNN
The parameters of eval
are basically the same as train
, only a few additional parameters are added, and for the rest, please refer to the parameter description of train
above.
Parameter | Usage | Default | Remarks |
---|---|---|---|
ckpt_load_path | Set model loading path | - | \ |
num_columns_of_labels | Set the number of labels in the dataset output columns | None | If None, assuming the columns after image (data[1:]) are labels. If not None, the num_columns_of_labels columns after image (data[1:1+num_columns_of_labels]) are labels, and the remaining columns are additional info like image_path. |
drop_remainder | Whether to discard the last batch of data when the total number of data cannot be divided by batch_size | True if undering training, otherwise False | It is recommended to set it to False when doing model evaluation. If it cannot be divisible, mindocr will automatically select a batch size that is the largest divisible |