Skip to content

Commit

Permalink
Merge pull request #798 from wanghaoshuang/ctc_doc
Browse files Browse the repository at this point in the history
 Refine document and scripts of CTC model.
  • Loading branch information
wanghaoshuang authored Apr 13, 2018
2 parents 0d48900 + bd97b39 commit 609dc34
Show file tree
Hide file tree
Showing 10 changed files with 397 additions and 196 deletions.
181 changes: 178 additions & 3 deletions fluid/ocr_recognition/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,179 @@
# OCR Model

[toc]

This model built with paddle fluid is still under active development and is not
the final version. We welcome feedbacks.
运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本。

# Optical Character Recognition

这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。

## 1. CRNN-CTC

本章的任务是识别含有单行汉语字符图片,首先采用卷积将图片转为`features map`, 然后使用`im2sequence op``features map`转为`sequence`,经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss,最终的评估指标为`instance error rate`

本路径下各个文件的作用如下:

- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()``test()` 分别产生训练集和测试集的数据迭代器。
- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
- **ctc_train.py :** 用于模型的训练,可通过命令`python train.py --help` 获得使用方法。
- **inference.py :** 加载训练好的模型文件,对新数据进行预测。可通过命令`python inference.py --help` 获得使用方法。
- **eval.py :** 评估模型在指定数据集上的效果。可通过命令`python inference.py --help` 获得使用方法。
- **utility.py :** 实现的一些通用方法,包括参数配置、tensor的构造等。


### 1.1 数据

数据的下载和简单预处理都在`ctc_reader.py`中实现。

#### 1.1.1 数据格式

我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的中文字符串,这些图片都是经过检测算法进行预框选处理的。

<p align="center">
<img src="images/demo.jpg" width="620" hspace='10'/> <br/>
<strong>图 1</strong>
</p>

在训练集中,每张图片对应的label是由若干数字组成的sequence。 Sequence中的每个数字表示一个字符在字典中的index。 `图1` 对应的label如下所示:
```
3835,8371,7191,2369,6876,4162,1938,168,1517,4590,3793
```
在上边这个label中,`3835` 表示字符‘两’的index,`4590` 表示中文字符逗号的index。


#### 1.1.2 数据准备

**A. 训练集**

我们需要把所有参与训练的图片放入同一个文件夹,暂且记为`train_images`。然后用一个list文件存放每张图片的信息,包括图片大小、图片名称和对应的label,这里暂记该list文件为`train_list`,其格式如下所示:

```
185 48 00508_0215.jpg 7740,5332,2369,3201,4162
48 48 00197_1893.jpg 6569
338 48 00007_0219.jpg 4590,4788,3015,1994,3402,999,4553
150 48 00107_4517.jpg 5936,3382,1437,3382
...
157 48 00387_0622.jpg 2397,1707,5919,1278
```

<center>文件train_list</center>

上述文件中的每一行表示一张图片,每行被空格分为四列,前两列分别表示图片的宽和高,第三列表示图片的名称,第四列表示该图片对应的sequence label。
最终我们应有以下类似文件结构:

```
|-train_data
|- train_list
|- train_imags
|- 00508_0215.jpg
|- 00197_1893.jpg
|- 00007_0219.jpg
| ...
```

在训练时,我们通过选项`--train_images``--train_list` 分别设置准备好的`train_images``train_list`


>**注:** 如果`--train_images``--train_list`都未设置或设置为None, ctc_reader.py会自动下载使用[示例数据](http://cloud.dlnel.org/filepub/?uuid=df937251-3c0b-480d-9a7b-0080dfeee65c),并将其缓存到`$HOME/.cache/paddle/dataset/ctc_data/data/` 路径下。

**B. 测试集和评估集**

测试集、评估集的准备方式与训练集相同。
在训练阶段,测试集的路径通过train.py的选项`--test_images``--test_list` 来设置。
在评估时,评估集的路径通过eval.py的选项`--input_images_dir``--input_images_list` 来设置。

**C. 待预测数据集**

预测支持三种形式的输入:

第一种:设置`--input_images_dir``--input_images_list`, 与训练集类似, 只不过list文件中的最后一列可以放任意占位字符或字符串,如下所示:

```
185 48 00508_0215.jpg s
48 48 00197_1893.jpg s
338 48 00007_0219.jpg s
...
```

第二种:仅设置`--input_images_list`, 其中list文件中只需放图片的完整路径,如下所示:

```
data/test_images/00000.jpg
data/test_images/00001.jpg
data/test_images/00003.jpg
```

第三种:从stdin读入一张图片的path,然后进行一次inference.

#### 1.2 训练

使用默认数据在GPU单卡上训练:

```
env CUDA_VISIABLE_DEVICES=0 python ctc_train.py
```

使用默认数据在GPU多卡上训练:

```
env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True
```

执行`python ctc_train.py --help`可查看更多使用方式和参数详细说明。

图2为使用默认参数和默认数据集训练的收敛曲线,其中横坐标轴为训练pass数,纵轴为在测试集上的sequence_error.

<p align="center">
<img src="images/train.jpg" width="620" hspace='10'/> <br/>
<strong>图 2</strong>
</p>



### 1.3 评估

通过以下命令调用评估脚本用指定数据集对模型进行评估:

```
env CUDA_VISIBLE_DEVICE=0 python eval.py \
--model_path="./models/model_0" \
--input_images_dir="./eval_data/images/" \
--input_images_list="./eval_data/eval_list\" \
```

执行`python ctc_train.py --help`可查看参数详细说明。


### 1.4 预测

从标准输入读取一张图片的路径,并对齐进行预测:

```
env CUDA_VISIBLE_DEVICE=0 python inference.py \
--model_path="models/model_00044_15000"
```

执行上述命令进行预测的效果如下:

```
----------- Configuration Arguments -----------
use_gpu: True
input_images_dir: None
input_images_list: None
model_path: /home/work/models/fluid/ocr_recognition/models/model_00052_15000
------------------------------------------------
Init model from: /home/work/models/fluid/ocr_recognition/models/model_00052_15000.
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0060.jpg
result: [3298 2371 4233 6514 2378 3298 2363]
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
result: [2067 2067 8187 8477 5027 7191 2431 1462]
```

从文件中批量读取图片路径,并对其进行预测:

```
env CUDA_VISIBLE_DEVICE=0 python inference.py \
--model_path="models/model_00044_15000" \
--input_images_list="data/test.list"
```
2 changes: 1 addition & 1 deletion fluid/ocr_recognition/crnn_ctc_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def ctc_train_net(images, label, args, num_classes):
gradient_clip = None
if args.parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places)
pd = fluid.layers.ParallelDo(places, use_nccl=True)
with pd.do():
images_ = pd.read_input(images)
label_ = pd.read_input(label)
Expand Down
74 changes: 60 additions & 14 deletions fluid/ocr_recognition/ctc_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ def train_reader(self, img_root_dir, img_label_list, batchsize):
Reader interface for training.
:param img_root_dir: The root path of the image for training.
:type file_list: str
:type img_root_dir: str
:param img_label_list: The path of the <image_name, label> file for training.
:type file_list: str
:type img_label_list: str
'''

Expand Down Expand Up @@ -91,10 +91,10 @@ def test_reader(self, img_root_dir, img_label_list):
Reader interface for inference.
:param img_root_dir: The root path of the images for training.
:type file_list: str
:type img_root_dir: str
:param img_label_list: The path of the <image_name, label> file for testing.
:type file_list: list
:type img_label_list: str
'''

def reader():
Expand All @@ -111,6 +111,42 @@ def reader():

return reader

def infer_reader(self, img_root_dir=None, img_label_list=None):
'''A reader interface for inference.
:param img_root_dir: The root path of the images for training.
:type img_root_dir: str
:param img_label_list: The path of the <image_name, label> file for
inference. It should be the path of <image_path> file if img_root_dir
was None. If img_label_list was set to None, it will read image path
from stdin.
:type img_root_dir: str
'''

def reader():
if img_label_list is not None:
for line in open(img_label_list):
if img_root_dir is not None:
# h, w, img_name, labels
img_name = line.split(' ')[2]
img_path = os.path.join(img_root_dir, img_name)
else:
img_path = line.strip("\t\n\r")
img = Image.open(img_path).convert('L')
img = np.array(img) - 127.5
img = img[np.newaxis, ...]
yield img, label
else:
while True:
img_path = raw_input("Please input the path of image: ")
img = Image.open(img_path).convert('L')
img = np.array(img) - 127.5
img = img[np.newaxis, ...]
yield img, [[0]]

return reader


def num_classes():
'''Get classes number of this dataset.
Expand All @@ -124,21 +160,31 @@ def data_shape():
return DATA_SHAPE


def train(batch_size):
def train(batch_size, train_images_dir=None, train_list_file=None):
generator = DataGenerator()
data_dir = download_data()
return generator.train_reader(
path.join(data_dir, TRAIN_DATA_DIR_NAME),
path.join(data_dir, TRAIN_LIST_FILE_NAME), batch_size)
if train_images_dir is None:
data_dir = download_data()
train_images_dir = path.join(data_dir, TRAIN_DATA_DIR_NAME)
if train_list_file is None:
train_list_file = path.join(data_dir, TRAIN_LIST_FILE_NAME)
return generator.train_reader(train_images_dir, train_list_file, batch_size)


def test(batch_size=1, test_images_dir=None, test_list_file=None):
generator = DataGenerator()
if test_images_dir is None:
data_dir = download_data()
test_images_dir = path.join(data_dir, TEST_DATA_DIR_NAME)
if test_list_file is None:
test_list_file = path.join(data_dir, TEST_LIST_FILE_NAME)
return paddle.batch(
generator.test_reader(test_images_dir, test_list_file), batch_size)


def test(batch_size=1):
def inference(infer_images_dir=None, infer_list_file=None):
generator = DataGenerator()
data_dir = download_data()
return paddle.batch(
generator.test_reader(
path.join(data_dir, TRAIN_DATA_DIR_NAME),
path.join(data_dir, TRAIN_LIST_FILE_NAME)), batch_size)
generator.infer_reader(infer_images_dir, infer_list_file), 1)


def download_data():
Expand Down
Loading

0 comments on commit 609dc34

Please sign in to comment.