Merge pull request #798 from wanghaoshuang/ctc_doc

Refine document and scripts of CTC model.
PaddlePaddle · Apr 13, 2018 · 609dc34 · 609dc34
2 parents 0d48900 + bd97b39
commit 609dc34
Show file tree

Hide file tree

Showing 10 changed files with 397 additions and 196 deletions.
diff --git a/fluid/ocr_recognition/README.md b/fluid/ocr_recognition/README.md
@@ -1,4 +1,179 @@
-# OCR Model
+
+[toc]
 
-This model built with paddle fluid is still under active development and is not
-the final version. We welcome feedbacks.
+运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求，请按照安装文档中的说明更新PaddlePaddle安装版本。
+
+# Optical Character Recognition
+
+这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。
+
+## 1. CRNN-CTC
+
+本章的任务是识别含有单行汉语字符图片，首先采用卷积将图片转为`features map`, 然后使用`im2sequence op`将`features map`转为`sequence`，经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss，最终的评估指标为`instance error rate`。
+
+本路径下各个文件的作用如下：
+
+- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()` 和 `test()` 分别产生训练集和测试集的数据迭代器。
+- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
+- **ctc_train.py :** 用于模型的训练，可通过命令`python train.py --help` 获得使用方法。
+- **inference.py :** 加载训练好的模型文件，对新数据进行预测。可通过命令`python inference.py --help` 获得使用方法。
+- **eval.py :** 评估模型在指定数据集上的效果。可通过命令`python inference.py --help` 获得使用方法。
+- **utility.py :** 实现的一些通用方法，包括参数配置、tensor的构造等。
+
+
+### 1.1 数据
+
+数据的下载和简单预处理都在`ctc_reader.py`中实现。
+
+#### 1.1.1 数据格式
+
+我们使用的训练和测试数据如`图1`所示，每张图片包含单行不定长的中文字符串，这些图片都是经过检测算法进行预框选处理的。
+
+<p align="center">
+<img src="images/demo.jpg" width="620" hspace='10'/> <br/>
+<strong>图 1</strong>
+</p>
+
+在训练集中，每张图片对应的label是由若干数字组成的sequence。 Sequence中的每个数字表示一个字符在字典中的index。 `图1` 对应的label如下所示：
+```
+3835,8371,7191,2369,6876,4162,1938,168,1517,4590,3793
+```
+在上边这个label中，`3835` 表示字符‘两’的index，`4590` 表示中文字符逗号的index。
+
+
+#### 1.1.2 数据准备
+
+**A. 训练集**
+
+我们需要把所有参与训练的图片放入同一个文件夹，暂且记为`train_images`。然后用一个list文件存放每张图片的信息，包括图片大小、图片名称和对应的label，这里暂记该list文件为`train_list`，其格式如下所示：
+
+```
+185 48 00508_0215.jpg 7740,5332,2369,3201,4162
+48 48 00197_1893.jpg 6569
+338 48 00007_0219.jpg 4590,4788,3015,1994,3402,999,4553
+150 48 00107_4517.jpg 5936,3382,1437,3382
+...
+157 48 00387_0622.jpg 2397,1707,5919,1278
+```
+
+<center>文件train_list</center>
+
+上述文件中的每一行表示一张图片，每行被空格分为四列，前两列分别表示图片的宽和高，第三列表示图片的名称，第四列表示该图片对应的sequence label。
+最终我们应有以下类似文件结构：
+
+```
+|-train_data
+    |- train_list
+    |- train_imags
+        |- 00508_0215.jpg
+        |- 00197_1893.jpg
+        |- 00007_0219.jpg
+        | ...
+```
+
+在训练时，我们通过选项`--train_images` 和 `--train_list` 分别设置准备好的`train_images` 和`train_list`。
+
+
+>**注：** 如果`--train_images` 和 `--train_list`都未设置或设置为None， ctc_reader.py会自动下载使用[示例数据](http://cloud.dlnel.org/filepub/?uuid=df937251-3c0b-480d-9a7b-0080dfeee65c)，并将其缓存到`$HOME/.cache/paddle/dataset/ctc_data/data/` 路径下。
+
+
+**B. 测试集和评估集**
+
+测试集、评估集的准备方式与训练集相同。
+在训练阶段，测试集的路径通过train.py的选项`--test_images` 和 `--test_list` 来设置。
+在评估时，评估集的路径通过eval.py的选项`--input_images_dir` 和`--input_images_list` 来设置。
+
+**C. 待预测数据集**
+
+预测支持三种形式的输入：
+
+第一种：设置`--input_images_dir`和`--input_images_list`, 与训练集类似, 只不过list文件中的最后一列可以放任意占位字符或字符串，如下所示：
+
+```
+185 48 00508_0215.jpg s
+48 48 00197_1893.jpg s
+338 48 00007_0219.jpg s
+...
+```
+
+第二种：仅设置`--input_images_list`, 其中list文件中只需放图片的完整路径，如下所示：
+
+```
+data/test_images/00000.jpg
+data/test_images/00001.jpg
+data/test_images/00003.jpg
+```
+
+第三种：从stdin读入一张图片的path，然后进行一次inference.
+
+#### 1.2 训练
+
+使用默认数据在GPU单卡上训练:
+
+```
+env CUDA_VISIABLE_DEVICES=0 python ctc_train.py
+```
+
+使用默认数据在GPU多卡上训练:
+
+```
+env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True
+```
+
+执行`python ctc_train.py --help`可查看更多使用方式和参数详细说明。
+
+图2为使用默认参数和默认数据集训练的收敛曲线，其中横坐标轴为训练pass数，纵轴为在测试集上的sequence_error.
+
+<p align="center">
+<img src="images/train.jpg" width="620" hspace='10'/> <br/>
+<strong>图 2</strong>
+</p>
+
+
+
+### 1.3 评估
+
+通过以下命令调用评估脚本用指定数据集对模型进行评估：
+
+```
+env CUDA_VISIBLE_DEVICE=0 python eval.py \
+    --model_path="./models/model_0" \
+    --input_images_dir="./eval_data/images/" \
+    --input_images_list="./eval_data/eval_list\" \
+```
+
+执行`python ctc_train.py --help`可查看参数详细说明。
+
+
+### 1.4 预测
+
+从标准输入读取一张图片的路径，并对齐进行预测：
+
+```
+env CUDA_VISIBLE_DEVICE=0 python inference.py \
+    --model_path="models/model_00044_15000"
+```
+
+执行上述命令进行预测的效果如下：
+
+```
+-----------  Configuration Arguments -----------
+use_gpu: True
+input_images_dir: None
+input_images_list: None
+model_path: /home/work/models/fluid/ocr_recognition/models/model_00052_15000
+------------------------------------------------
+Init model from: /home/work/models/fluid/ocr_recognition/models/model_00052_15000.
+Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0060.jpg
+result: [3298 2371 4233 6514 2378 3298 2363]
+Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
+result: [2067 2067 8187 8477 5027 7191 2431 1462]
+```
+
+从文件中批量读取图片路径，并对其进行预测：
+
+```
+env CUDA_VISIBLE_DEVICE=0 python inference.py \
+    --model_path="models/model_00044_15000" \
+    --input_images_list="data/test.list"
+```
diff --git a/fluid/ocr_recognition/crnn_ctc_model.py b/fluid/ocr_recognition/crnn_ctc_model.py
@@ -143,7 +143,7 @@ def ctc_train_net(images, label, args, num_classes):
     gradient_clip = None
     if args.parallel:
         places = fluid.layers.get_places()
-        pd = fluid.layers.ParallelDo(places)
+        pd = fluid.layers.ParallelDo(places, use_nccl=True)
         with pd.do():
             images_ = pd.read_input(images)
             label_ = pd.read_input(label)

diff --git a/fluid/ocr_recognition/ctc_reader.py b/fluid/ocr_recognition/ctc_reader.py
@@ -30,10 +30,10 @@ def train_reader(self, img_root_dir, img_label_list, batchsize):
         Reader interface for training.
 
         :param img_root_dir: The root path of the image for training.
-        :type file_list: str
+        :type img_root_dir: str
 
         :param img_label_list: The path of the <image_name, label> file for training.
-        :type file_list: str
+        :type img_label_list: str
 
         '''
 
@@ -91,10 +91,10 @@ def test_reader(self, img_root_dir, img_label_list):
         Reader interface for inference.
 
         :param img_root_dir: The root path of the images for training.
-        :type file_list: str
+        :type img_root_dir: str
 
         :param img_label_list: The path of the <image_name, label> file for testing.
-        :type file_list: list
+        :type img_label_list: str
         '''
 
         def reader():
@@ -111,6 +111,42 @@ def reader():
 
         return reader
 
+    def infer_reader(self, img_root_dir=None, img_label_list=None):
+        '''A reader interface for inference.
+
+        :param img_root_dir: The root path of the images for training.
+        :type img_root_dir: str
+
+        :param img_label_list: The path of the <image_name, label> file for
+        inference. It should be the path of <image_path> file if img_root_dir
+        was None. If img_label_list was set to None, it will read image path
+        from stdin.
+        :type img_root_dir: str
+        '''
+
+        def reader():
+            if img_label_list is not None:
+                for line in open(img_label_list):
+                    if img_root_dir is not None:
+                        # h, w, img_name, labels
+                        img_name = line.split(' ')[2]
+                        img_path = os.path.join(img_root_dir, img_name)
+                    else:
+                        img_path = line.strip("\t\n\r")
+                    img = Image.open(img_path).convert('L')
+                    img = np.array(img) - 127.5
+                    img = img[np.newaxis, ...]
+                    yield img, label
+            else:
+                while True:
+                    img_path = raw_input("Please input the path of image: ")
+                    img = Image.open(img_path).convert('L')
+                    img = np.array(img) - 127.5
+                    img = img[np.newaxis, ...]
+                    yield img, [[0]]
+
+        return reader
+
 
 def num_classes():
     '''Get classes number of this dataset.
@@ -124,21 +160,31 @@ def data_shape():
     return DATA_SHAPE
 
 
-def train(batch_size):
+def train(batch_size, train_images_dir=None, train_list_file=None):
     generator = DataGenerator()
-    data_dir = download_data()
-    return generator.train_reader(
-        path.join(data_dir, TRAIN_DATA_DIR_NAME),
-        path.join(data_dir, TRAIN_LIST_FILE_NAME), batch_size)
+    if train_images_dir is None:
+        data_dir = download_data()
+        train_images_dir = path.join(data_dir, TRAIN_DATA_DIR_NAME)
+    if train_list_file is None:
+        train_list_file = path.join(data_dir, TRAIN_LIST_FILE_NAME)
+    return generator.train_reader(train_images_dir, train_list_file, batch_size)
+
+
+def test(batch_size=1, test_images_dir=None, test_list_file=None):
+    generator = DataGenerator()
+    if test_images_dir is None:
+        data_dir = download_data()
+        test_images_dir = path.join(data_dir, TEST_DATA_DIR_NAME)
+    if test_list_file is None:
+        test_list_file = path.join(data_dir, TEST_LIST_FILE_NAME)
+    return paddle.batch(
+        generator.test_reader(test_images_dir, test_list_file), batch_size)
 
 
-def test(batch_size=1):
+def inference(infer_images_dir=None, infer_list_file=None):
     generator = DataGenerator()
-    data_dir = download_data()
     return paddle.batch(
-        generator.test_reader(
-            path.join(data_dir, TRAIN_DATA_DIR_NAME),
-            path.join(data_dir, TRAIN_LIST_FILE_NAME)), batch_size)
+        generator.infer_reader(infer_images_dir, infer_list_file), 1)
 
 
 def download_data():