Fluid benchmark & book validation #6208

dzhwinter · 2017-12-04T05:19:50Z

In the 0.11.0 version, we will release the book chapters written with fluid, there are some tasks need to be done.

Task Lists 1 : compare results with Paddle books V2

Need to validate these books can convergence to the approximate same result with books chapters.

book.03 image classification CPU loss validation @jacquesqiao @qingqing01 @kuke
book.03 image classification GPU loss validation @jacquesqiao @qingqing01 @kuke
- ResNet
- VGG
book.04 word2vec CPU loss validation @peterzhang2029
book.04 word2vec GPU loss validation @peterzhang2029
book.05 recommendation systems CPU loss validation @typhoonzero
book.05 recommendation systems GPU loss validation @typhoonzero

Need to note that we have three different implementation of understand_sentiment, only test the lstm one in this chapter.

book.06 understand_sentiment lstm CPU loss validation @ranqiu92
book.06 understand_sentiment lstm GPU loss validation @ranqiu92
book.07 label semantic roles CPU loss validation @chengduoZH
We do not have GPU version label semantic roles implementation.
book.08 machine translation CPU loss validation @jacquesqiao @ChunweiYan
book.08 machine translation GPU loss validation @jacquesqiao @ChunweiYan

Task Lists How to do

We have benchmark scripts and docker image. So these things should be done quickly and report a bug if you find any issue. (operator implement, convergence result).
Because we are still finetuning the performance, so if you find any magnitude gap in performance, please file an issue without hesitation.

scripts are put under this directory, please find the correct chapter name:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/tests/book

old books docker image:
paddlepaddle/book:latest-gpu

new books docker image:
dzhwinter/benchmark:latest

The text was updated successfully, but these errors were encountered:

dzhwinter · 2017-12-04T05:38:45Z

Task Lists 2 : compare results and performance with TensorFlow

We select some typical tasks, to validate our performance and results with TensorFlow.
Need to note that performance validation should be done in latest Paddle with Release mode, set the same cuda/cudnn version.

book.02 mnist GPU performance validation with TensorFlow @pkuyym
book.03 image classification CPU performance validation with TensorFlow @jacquesqiao @kuke
book.03 image classification GPU performance validation with TensorFlow @jacquesqiao @kuke
book.06 understand sentiment CPU performance validation with TensorFlow @pkuyym
book.06 understand sentiment GPU performance validation with TensorFlow @pkuyym

Task Lists 2 How to do

scripts:
https://github.com/dzhwinter/benchmark

both cpu tests can be done in docker image, but you need to install tensorflow-gpu by yourself:
dzhwinter/benchmark:latest

dzhwinter · 2017-12-04T07:57:05Z

recomend docker command.
nvidia-docker run -it --name mnist_gpu --security-opt seccomp=unconfined -v $PWD/benchmark:/benchmark -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu dzhwinter/benchmark:latest /bin/bash

chengduoZH · 2017-12-07T06:04:35Z

crf_decoding is needed by "label senmantic roles", but it does not support GPU on both v2 and fluid.

~~book.07 label senmantic roles GPU loss validation~~

reyoung · 2017-12-11T02:44:45Z

This feature will be delayed to 0.11.1

dzhwinter · 2017-12-13T05:41:57Z

We have released the Fluid 0.11.0 two days ago. Next stage will be introducing our new design to users. We need more solid metrics to compare with other frameworks, find the potential flaw in Fluid.
We maintain a repo to collect the common models, to make benchmark reproduce more easily. Need to notice that this repo and docker image will transfer to paddle in the future.

git clone https://github.com/dzhwinter/benchmark
docker pull dzhwinter/benchmark:latest
nvidia-docker run -it --name mnist_gpu --security-opt seccomp=unconfined -v $PWD/benchmark:/benchmark -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu dzhwinter/benchmark:latest /bin/bash

https://github.com/dzhwinter/benchmark/blob/master/HowToDoBenchmark.md

Follow the guide step by step, then you will get the same environment.

To make the benchmark more convincing and general testing, there are some decisions need to make.

We select the most popular models- mnist, VGG-19, Resnet, Stacked-lstm. Is there any more general benchmark models we need to include?
Do we need an op-wise compare framework to support our per-op precision checking?
Namely, write an op test, one side is TensorFlow, one side is Fluid, help us debugging big models.
Need extreme models to test Fluid border.
Need public and popular dataset to validate our performance.
Select some typical RNN Case.

https://www.tensorflow.org/performance/benchmarks
https://github.com/pytorch/benchmark

TensorFlow and Pytorch focus on huge image related models, namely, CNN models. I think their test case is not general enough, but also can be a comparable sign.

dzhwinter · 2017-12-13T06:15:11Z

如所提出的问题，讨论了以上5个方面的问题。防止英文翻译有偏，会议记录中文如下：

benchmark选择的模型包括

	mnist	VGG-19	Resnet50	Stacked-LSTM	seq2seq
dataset	mnist	flowers/imagenet	cifar100/imagenet	imdb	machine translation dataset
reason	The easist case	Extreme big CNN model	Most used image model	The easist RNN model	popular attention model

分别选取了图像领域的经典模型VGG, Resnet，属于CNN，和NLP领域的经典模型stacked-lstm, seq2seq，属于RNN。
以上模型需要在每次发版后做基准测试，并发布数据指标，和TensroFlow对比曲线。

benchmark的镜像为当次发版后paddlepaddle/paddle:VERSION
benchmark的代码库目前为https://github.com/dzhwinter/benchmark, 将会迁移到paddlepaddle

benchmark选取数据指标

	test accuracy	instance/second	GPU memory size*
batch size
Fluid
TensorFlow

benchmark需要和TensorFlow对齐指标，其中表格中test accuracy是迭代相同轮数后的结果，instance/second衡量训练速度，GPU memory size不同架构可能不同，作为参考。

提供极端模型Test测试框架

对框架性能方面：
vnet. 医疗图像的image segementation, 大量使用conv, conv_transpose, 占用内存高，业务线的痛点。
Resnet1000. 占用内存高，计算量大。测试计算速度，内存优化。
对框架表达能力方面：
Tree-Based RNN, 实现难度高，考虑下阶段实现
其他变长RNN的典型模型，宣传优势

benchmark的数据筛选问题 @pkuyym
分为两个方面，

对厂外：数据公开且有大量公开指标。例如图像领域的limagenet，cifar100，RNN
对厂内：找典型业务线数据合作，攻下痛点。另厂内数据大，关注多GPU，多机进度。
厂内最近有公开数据，考虑合作: http://ai.baidu.com/broad

亮点应用
变长RNN在视频数据上的应用。理由：目前视频领域基准少，延续Paddle的RNN速度优势，LoDTensor表达RNN简单，视频领域足够大。
op 测试框架
和TensorFlow的单个op对齐的框架，一侧Fluid，一侧TensorFlow，feed data，完全对齐op结果。主要目标是对模型的debug，以及对业务线更有说服力。

NOTES:
TensorFlow首次构造模型速度慢，Fluid有优势，可以作为对比点。
需要对比TensoFlow eager execution

pkuyym · 2017-12-13T06:36:38Z

Popular datasets
http://yann.lecun.com/exdb/mnist
http://ufldl.stanford.edu/housenumbers
https://www.cs.toronto.edu/~kriz/cifar.html
http://www.image-net.org
http://cocodataset.org

dzhwinter added this to the Release 0.11.0 milestone Dec 4, 2017

reyoung modified the milestones: Release 0.11.0, Release 0.11.1 Dec 11, 2017

dzhwinter added the need be discussed label Dec 13, 2017

dzhwinter changed the title ~~chapters convergence validation~~ Fluid benchmark & book validation Dec 13, 2017

gongweibao closed this as completed Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluid benchmark & book validation #6208

Fluid benchmark & book validation #6208

dzhwinter commented Dec 4, 2017 •

edited by llxxxll

Loading

dzhwinter commented Dec 4, 2017 •

edited by pkuyym

Loading

dzhwinter commented Dec 4, 2017 •

edited

Loading

chengduoZH commented Dec 7, 2017 •

edited

Loading

reyoung commented Dec 11, 2017

dzhwinter commented Dec 13, 2017 •

edited

Loading

dzhwinter commented Dec 13, 2017 •

edited

Loading

pkuyym commented Dec 13, 2017 •

edited

Loading

Fluid benchmark & book validation #6208

Fluid benchmark & book validation #6208

Comments

dzhwinter commented Dec 4, 2017 • edited by llxxxll Loading

Task Lists 1 : compare results with Paddle books V2

Task Lists How to do

dzhwinter commented Dec 4, 2017 • edited by pkuyym Loading

Task Lists 2 : compare results and performance with TensorFlow

Task Lists 2 How to do

dzhwinter commented Dec 4, 2017 • edited Loading

chengduoZH commented Dec 7, 2017 • edited Loading

reyoung commented Dec 11, 2017

dzhwinter commented Dec 13, 2017 • edited Loading

dzhwinter commented Dec 13, 2017 • edited Loading

pkuyym commented Dec 13, 2017 • edited Loading

dzhwinter commented Dec 4, 2017 •

edited by llxxxll

Loading

dzhwinter commented Dec 4, 2017 •

edited by pkuyym

Loading

dzhwinter commented Dec 4, 2017 •

edited

Loading

chengduoZH commented Dec 7, 2017 •

edited

Loading

dzhwinter commented Dec 13, 2017 •

edited

Loading

dzhwinter commented Dec 13, 2017 •

edited

Loading

pkuyym commented Dec 13, 2017 •

edited

Loading