Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine document and scripts of CTC model. #798

Merged
merged 8 commits into from
Apr 13, 2018

Conversation

wanghaoshuang
Copy link
Contributor

fix #764

  1. Add document.
  2. Add arguments for saving model and init model.
  3. Refine inference.py and eval.py.
  4. Make ctc_reader.py support for custom data.

1. Add document.
2. Add arguments for saving model and init model.
3. Refine inference.py and eval.py.
4. Make ctc_reader.py support for custom data.

**- -test_list :** 存放测试集图片信息的list文件,如果设置为None,ctc_reader会自动下载使用默认数据集。如果使用自己的数据进行测试,需要修改该选项。默认为None。

**- -num_classes :** 字符集的大小。如果设置为None, 则使用ctc_reader提供的字符集大小。如果使用自己的数据进行训练,需要修改该选项。默认为None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里觉得不用每个参数都解释,以后变动就得改。可以train.py里的参数注释写好点,告诉用户: python train.py --help查看使用方法。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


**--input_images_list :** 存放待预测图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.

**--device DEVICE :** 设备ID。设置为-1,运行在CPU上;设置为0,运行在GPU上。默认为0。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


**--device DEVICE :** 设备ID。设置为-1,运行在CPU上;设置为0,运行在GPU上。默认为0。

预测结果会print到标准输出。
Copy link
Collaborator

@qingqing01 qingqing01 Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于OCR的预测,最好输入一张图片,输出一个文本,结果显示出来。

Copy link
Contributor Author

@wanghaoshuang wanghaoshuang Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接输出文本有点困难,还需要用户给一个字典。
现在是用户可以输入一个图片路径,然后即时输出一个indexes 序列结果.

python inference.py --model_path models/model_00044_15000
-----------  Configuration Arguments -----------
device: 0
input_images_dir: None
input_images_list: None
model_path: models/model_00044_15000
------------------------------------------------
Init model from: models/model_00044_15000.
Please input the path of image: data/test_images/00008_4700.jpg
result: [6514 5919 3415  173]
Please input the path of image:


**--input_images_list :** 存放待评估图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.

**--device DEVICE :** 设备ID。设置为-1,运行在CPU上;设置为0,运行在GPU上。默认为0。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,Evaluation可以放在Inference前面?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. Remove illustration of arguments.
2. Make inference support for more format input.
@wanghaoshuang wanghaoshuang requested review from abhinavarora and removed request for abhinavarora April 8, 2018 05:13

This model built with paddle fluid is still under active development and is not
the final version. We welcome feedbacks.
运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要求需要更新,让使用Develop最新版本吧,后续稳定了,我们在更改。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.

<p align="center">
<img src="images/train.jpg" width="620" hspace='10'/> <br/>
<strong>图 2</strong>
</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thx.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 给出的图的同时,解释下图的意思, 以及说下seq error是多少。 有train的seq error吗?如果有画两条?
  2. 是否需要给出train和test的cost图?




### 1.3 Evaluate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是中文文档,请标题也使用中文。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.

--model_path models/model_00044_15000
```

Read image path from list file and inference:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

语言统一使用中文。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.

env CUDA_VISIBLE_DEVICE=0 python inference.py \
--model_path=models/model_00044_15000 \
--input_images_list="data/test.list"
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请给出预测结果示例。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.


```
env CUDA_VISIABLE_DEVICES=0 python ctc_train.py \
--device=0 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码里可以去掉device,换成use_gpu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.

--device=0 \
--parallel=False \
--batch_size=32
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些参数都是默认的话,写成下面这样? 方便用户直接粘贴赋值。

env CUDA_VISIABLE_DEVICES=0 python ctc_train.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thx.

add_arg('learning_rate', float, 1.0e-3, "Learning rate.")
add_arg('l2', float, 0.0004, "L2 regularizer.")
add_arg('max_clip', float, 10.0, "Max clip threshold.")
add_arg('min_clip', float, -10.0, "Min clip threshold.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_clip/min_clip用了吗? 感觉尽量减少参数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

add_arg('test_list', str, None, "The list file of training images."
"None means using the default test_list file of reader.")
add_arg('num_classes', int, None, "The number of classes."
"None means using the default num_classes from reader.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

觉得用户可配置的参数是做实验【频繁】使用的参数。 这里的参数太多了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉还好吧,有些参数虽然不经常调整,但是也是必不可少的。

1. Remove unused  arguments.
2. Refine doc.
3. Change 'device' to 'use_gpu'.
Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the PR. But some comments need to fix in next PR.

@@ -1,4 +1,179 @@
# OCR Model

[toc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Optical Character Recognition

这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluid -> Fluid


## 1. CRNN-CTC

本章的任务是识别含有单行汉语字符图片,首先采用卷积将图片转为`features map`, 然后使用`im2sequence op`将`features map`转为`sequence`,经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss,最终的评估指标为`instance error rate`。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

features map换成中文吧,叫特征图。
sequence-> 序列。

经过双向GRU RNN得到每个step的汉语字符的概率分布

实际模型里并没有得到概率分布。

通过双向GRU学习到序列特征。

第一出现的CTC地方,需要中文。

instance error rate: 需要解释明白, 可以写成: 样本级别的错误率。

- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()` 和 `test()` 分别产生训练集和测试集的数据迭代器。
- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
- **ctc_train.py :** 用于模型的训练,可通过命令`python train.py --help` 获得使用方法。
- **inference.py :** 加载训练好的模型文件,对新数据进行预测。可通过命令`python inference.py --help` 获得使用方法。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference.py -> infer.py

<strong>图 1</strong>
</p>

在训练集中,每张图片对应的label是由若干数字组成的sequence。 Sequence中的每个数字表示一个字符在字典中的index。 `图1` 对应的label如下所示:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每张图片对应的label是由若干数字组成的sequence。Sequence中的每个数字表示一个字符在字典中的index。

每张图片对应的label是汉字在词典中的索引。

Init model from: /home/work/models/fluid/ocr_recognition/models/model_00052_15000.
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0060.jpg
result: [3298 2371 4233 6514 2378 3298 2363]
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
这样的路径在文档中,对用户不友好。

这里可以使用图1的图片吗? 输出结果可以换成在词典中转换之后的汉字吗?

<p align="center">
<img src="images/train.jpg" width="620" hspace='10'/> <br/>
<strong>图 2</strong>
</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 给出的图的同时,解释下图的意思, 以及说下seq error是多少。 有train的seq error吗?如果有画两条?
  2. 是否需要给出train和test的cost图?

@wanghaoshuang wanghaoshuang merged commit 609dc34 into PaddlePaddle:develop Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

能否完善fluid下的OCR的文档
2 participants