MindOCR supports model inference for models trained using MindOCR and converted from third-party toolboxes(PaddelOCR and mmOCR).
The inference of MindOCR models supports MindSpore Lite backend.
graph LR;
A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
Before inference, it is necessary to export the trained ckpt to a MindIR file. Please run tools/export.py
.
# Export mindir of model `dbnet_resnet50` by downloading online ckpt
python tools/export.py --model_name_or_config dbnet_resnet50 --data_shape 736 1280
# Export mindir of model `dbnet_resnet50` by loading local ckpt
python tools/export.py --model_name_or_config dbnet_resnet50 --data_shape 736 1280 --local_ckpt_path /path/to/local_ckpt
# Export mindir of model whose architecture is defined by crnn_resnet34.yaml with local checkpoint
python tools/export.py --model_name_or_config configs/rec/crnn/crnn_resnet34.yaml --local_ckpt_path ~/.mindspore/models/crnn_resnet34-83f37f07.ckpt --data_shape 32 100
For more usage, run `python tools/export.py -h`.
Some models provide download links for MIndIR export files, as shown in Model List. You can jump to the corresponding model introduction page for download.
You need to use the converter_lite
tool to convert the above exported MindIR file offline so that it can be used for MindSpore Lite inference.
The tutorial for the converter_lite
command can be referred to Offline Conversion of Inference Models.
Assuming the input model is input.mindir and the output model after converter_lite
conversion is output.mindir, the conversion command is as follows:
converter_lite \
--saveType=MINDIR \
--fmk=MINDIR \
--optimize=ascend_oriented \
--modelFile=input.mindir \
--outputFile=output \
--configFile=config.txt
Among them, config.txt
can be used to set the shape and inference precision of the conversion model.
- Static Shape
If the input name of the exported model is x
, and the input shape is (1,3,736,1280)
, then the config.txt
is as follows:
[ascend_context]
input_format=NCHW
input_shape=x:[1,3,736,1280]
The generated output.mindir is a static shape version, and the input image during inference needs to be resized to this input_shape to meet the input requirements.
In some inference scenarios, such as detecting a target and then executing the target recognition network, the number and size of targets is not fixed resulting. If each inference is computed at the maximum Batch Size or maximum Image Size, it will result in wasted computational resources.
Assuming the exported model input shape is (-1, 3, -1, -1), and the NHW axes are dynamic. Therefore, some optional values can be set during model conversion to adapt to input images of various size during inference.
converter_lite
achieves this by setting the dynamic_dims
parameter in [ascend_context]
through --configFile
. Please refer to the Dynamic Shape Configuration for details. We will refer to it as Model Shape Scaling for short.
So, there are two options for conversion, by setting different config.txt:
- Dynamic Image Size
N uses fixed values, HW uses multiple optional values, the config.txt is as follows:
[ascend_context]
input_format=NCHW
input_shape=x:[1,3,-1,-1]
dynamic_dims=[736,1280],[768,1280],[896,1280],[1024,1280]
- Dynamic Batch Size
N uses multiple optional values, HW uses fixed values, the config.txt is as follows:
[ascend_context]
input_format=NCHW
input_shape=x:[-1,3,736,1280]
dynamic_dims=[1],[4],[8],[16],[32]
When converting the dynamic batch size/image size model, the option of NHW values can be set by the user based on empirical values or calculated from the dataset.
If your model needs to support both dynamic batch size and dynamic image size togather, you can combine multiple models with different batch size, each using the same dynamic image size.
In order to simplify the model conversion process, we have developed an automatic tool that can complete the dynamic value selection and model conversion. For detailed tutorials, please refer to Model Shape Scaling.
Notes:
If the exported model is a static shape version, it cannot support dynamic image size and batch size conversion. It is necessary to ensure that the exported model is a dynamic shape version.
For the precision of model inference, it is necessary to set it in converter_lite
when converting the model.
Please refer to the Ascend Conversion Tool Description, the usage of precision_mode
parameter is described in the table of the configuration file, you can choose enforce_fp16
, enforce_fp32
, preferred_fp32
and enforce_origin
etc.
So, you can add the precision_mode
parameter in the [Ascend_context]
of the above config.txt file to set the precision mode:
[ascend_context]
input_format=NCHW
input_shape=x:[1,3,736,1280]
precision_mode=enforce_fp32
If not set, defaults to enforce_fp16
.
The PaddleOCR models support two inference backends: MindSpore Lite, corresponding to the MindSpore Lite MindIR model and OM model, respectively.
graph LR;
in1[PaddleOCR trained model] -- export --> in2[PaddleOCR inference model] -- paddle2onnx --> ONNX;
ONNX -- converter_lite --> o2(MindSpore Lite MindIR);
ONNX -- atc --> o1(OM);
Two formats of Paddle models are used here, the training model and the inference model. The differences are as follows:
type | format | description |
---|---|---|
trained model | .pdparams、.pdopt、.states | PaddlePaddle trained model, which can store information such as model structure, weights, optimizer status, etc |
inference model | inference.pdmodel、inference.pdiparams | PaddlePaddle inference model, which can be derived from its trained model, saving the network structure and weights. |
After downloading the model file and decompressing it, please distinguish between the trained model and inference model according to the model format.
In the download link of PaddleOCR model, there are two formats: trained model and inference model. If a training model is provided, it needs to be converted to the format of inference model.
On the original PaddleOCR introduction page of each trained model, there are usually conversion script samples that only need to input the config file, model file, and save path of the trained model. The example is as follows:
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# cd PaddleOCR
python tools/export_model.py \
-c configs/det/det_r50_vd_db.yml \
-o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy \
Global.save_inference_dir=./det_db
Install model conversion tool paddle2onnx:pip install paddle2onnx==0.9.5
For detailed usage tutorials, please refer to Paddle2ONNX model transformation and prediction。
Run the conversion command to generate the onnx model:
paddle2onnx \
--model_dir det_db \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file det_db.onnx \
--opset_version 11 \
--input_shape_dict="{'x':[-1,3,-1,-1]}" \
--enable_onnx_checker True
The input_shape_dict
in the parameter can generally be viewed by opening the inference model using the Netron,
or found in the code in tools/export_model. py above.
The converter_lite
script can be used to convert the ONNX into a MindSpore Lite MindIR. For detailed usage tutorials,
please refer to Offline Conversion of Inference Models。
The conversion command is as follows:
converter_lite \
--saveType=MINDIR \
--fmk=ONNX \
--optimize=ascend_oriented \
--modelFile=det_db.onnx \
--outputFile=det_db_output \
--configFile=config.txt
The conversion process is same as for MindOCR models,
except that --fmk
needs to specify that the input is the ONNX, which will not be repeated here.
The ONNX model can be converted into an OM model by ATC tools.
Ascend Tensor Compiler (ATC) is a model conversion tool built upon the heterogeneous computing architecture CANN. It is designed to convert models of open-source frameworks into .om offline models supported by Ascend AI Processor. A detailed tutorial on the tool can be found in ATC Instructions.
The exported ONNX in the example has an input shape of (-1, 3, -1, -1).
- Static Shape
It can be converted to a static shape version by fixed values for NHW, the command is as follows:
atc --model=det_db.onnx \
--framework=5 \
--input_shape="x:1,3,736,1280" \
--input_format=ND \
--soc_version=Ascend310P3 \
--output=det_db_static \
--log=error
The generated file is a static shape version, and the input image during inference needs to be resized to this input_shape to meet the input requirements.
The ATC tool also supports Model Shape Scaling by parameter dynamic_dims, and some optional values can be set during model conversion to adapt to input images of various shape during inference.
So, there are two options for conversion, by setting different command line parameters:
- Dynamic Image Size
N uses fixed values, HW uses multiple optional values, the command is as follows:
atc --model=det_db.onnx \
--framework=5 \
--input_shape="x:1,3,-1,-1" \
--input_format=ND \
--dynamic_dims="736,1280;768,1280;896,1280;1024,1280" \
--soc_version=Ascend310P3 \
--output=det_db_dynamic_bs \
--log=error
- Dynamic Batch Size
N uses multiple optional values, HW uses fixed values, the command is as follows:
atc --model=det_db.onnx \
--framework=5 \
--input_shape="x:-1,3,736,1280" \
--input_format=ND \
--dynamic_dims="1;4;8;16;32" \
--soc_version=Ascend310P3 \
--output=det_db_dynamic_bs \
--log=error
When converting the dynamic batch size/image size model, the option of NHW values can be set by the user based on empirical values or calculated from the dataset.
If your model needs to support both dynamic batch size and dynamic image size togather, you can combine multiple models with different batch size, each using the same dynamic image size.
In order to simplify the model conversion process, we have developed an automatic tool that can complete the dynamic value selection and model conversion. For detailed tutorials, please refer to Model Shape Scaling.
Notes:
If the exported model is a static shape version, it cannot support dynamic image size and batch size conversion. It is necessary to ensure that the exported model is a dynamic shape version.
For the precision of model inference, it is necessary to set it in atc
when converting the model.
Please refer to the Command-Line Options. Optional values include force_fp16
, force_fp32
, allow_fp32_to_fp16
, must_keep_origin_dtype
, allow_mix_precision
, etc.
So, you can add the precision_mode
parameter in the atc
command line to set the precision:
atc --model=det_db.onnx \
--framework=5 \
--input_shape="x:1,3,736,1280" \
--input_format=ND \
--precision_mode=force_fp32 \
--soc_version=Ascend310P3 \
--output=det_db_static \
--log=error
If not set, defaults to force_fp16
.
MMOCR uses Pytorch, and its model files typically have a pth format suffix. You need to first export it to ONNX format and then convert to an OM/MindIR format file supported by ACL/MindSpore Lite.
graph LR;
MMOCR_pth -- export --> ONNX;
ONNX -- converter_lite --> o2(MindSpore Lite MindIR);
ONNX -- atc --> o1(OM);
MMDeploy provides the command to export MMOCR models to ONNX. For detailed tutorials, please refer to How to convert model.
For parameter deploy_cfg
, you need to select the *_onnxruntime_dynamic.py
file in directory
mmdeploy/configs/mmocr to export as a dynamic shape
ONNX model.
Please refer to ONNX -> MindSpore Lite MindIR in the PaddleOCR section above.
Please refer to ONNX -> OM in the PaddleOCR section above.