Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluid benchmark & book validation #6208

Closed
13 tasks
dzhwinter opened this issue Dec 4, 2017 · 7 comments
Closed
13 tasks

Fluid benchmark & book validation #6208

dzhwinter opened this issue Dec 4, 2017 · 7 comments

Comments

@dzhwinter
Copy link
Contributor

dzhwinter commented Dec 4, 2017

In the 0.11.0 version, we will release the book chapters written with fluid, there are some tasks need to be done.

Task Lists 1 : compare results with Paddle books V2

Need to validate these books can convergence to the approximate same result with books chapters.

Need to note that we have three different implementation of understand_sentiment, only test the lstm one in this chapter.

  • book.06 understand_sentiment lstm CPU loss validation @ranqiu92

  • book.06 understand_sentiment lstm GPU loss validation @ranqiu92

  • book.07 label semantic roles CPU loss validation @chengduoZH
    We do not have GPU version label semantic roles implementation.

  • book.08 machine translation CPU loss validation @jacquesqiao @ChunweiYan

  • book.08 machine translation GPU loss validation @jacquesqiao @ChunweiYan

Task Lists How to do

We have benchmark scripts and docker image. So these things should be done quickly and report a bug if you find any issue. (operator implement, convergence result).
Because we are still finetuning the performance, so if you find any magnitude gap in performance, please file an issue without hesitation.

scripts are put under this directory, please find the correct chapter name:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/tests/book

old books docker image:
paddlepaddle/book:latest-gpu

new books docker image:
dzhwinter/benchmark:latest

@dzhwinter
Copy link
Contributor Author

dzhwinter commented Dec 4, 2017

Task Lists 2 : compare results and performance with TensorFlow

We select some typical tasks, to validate our performance and results with TensorFlow.
Need to note that performance validation should be done in latest Paddle with Release mode, set the same cuda/cudnn version.

  • book.02 mnist GPU performance validation with TensorFlow @pkuyym

  • book.03 image classification CPU performance validation with TensorFlow @jacquesqiao @kuke

  • book.03 image classification GPU performance validation with TensorFlow @jacquesqiao @kuke

  • book.06 understand sentiment CPU performance validation with TensorFlow @pkuyym

  • book.06 understand sentiment GPU performance validation with TensorFlow @pkuyym

Task Lists 2 How to do

scripts:
https://github.com/dzhwinter/benchmark

both cpu tests can be done in docker image, but you need to install tensorflow-gpu by yourself:
dzhwinter/benchmark:latest

@dzhwinter
Copy link
Contributor Author

dzhwinter commented Dec 4, 2017

recomend docker command.
nvidia-docker run -it --name mnist_gpu --security-opt seccomp=unconfined -v $PWD/benchmark:/benchmark -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu dzhwinter/benchmark:latest /bin/bash

@dzhwinter dzhwinter added this to the Release 0.11.0 milestone Dec 4, 2017
@chengduoZH
Copy link
Contributor

chengduoZH commented Dec 7, 2017

crf_decoding is needed by "label senmantic roles", but it does not support GPU on both v2 and fluid.

book.07 label senmantic roles GPU loss validation

@reyoung
Copy link
Collaborator

reyoung commented Dec 11, 2017

This feature will be delayed to 0.11.1

@dzhwinter
Copy link
Contributor Author

dzhwinter commented Dec 13, 2017

We have released the Fluid 0.11.0 two days ago. Next stage will be introducing our new design to users. We need more solid metrics to compare with other frameworks, find the potential flaw in Fluid.
We maintain a repo to collect the common models, to make benchmark reproduce more easily. Need to notice that this repo and docker image will transfer to paddle in the future.

git clone https://github.com/dzhwinter/benchmark
docker pull dzhwinter/benchmark:latest
nvidia-docker run -it --name mnist_gpu --security-opt seccomp=unconfined -v $PWD/benchmark:/benchmark -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu dzhwinter/benchmark:latest /bin/bash

https://github.com/dzhwinter/benchmark/blob/master/HowToDoBenchmark.md

Follow the guide step by step, then you will get the same environment.

To make the benchmark more convincing and general testing, there are some decisions need to make.

  1. We select the most popular models- mnist, VGG-19, Resnet, Stacked-lstm. Is there any more general benchmark models we need to include?

  2. Do we need an op-wise compare framework to support our per-op precision checking?
    Namely, write an op test, one side is TensorFlow, one side is Fluid, help us debugging big models.

  3. Need extreme models to test Fluid border.

  4. Need public and popular dataset to validate our performance.

  5. Select some typical RNN Case.

https://www.tensorflow.org/performance/benchmarks
https://github.com/pytorch/benchmark

TensorFlow and Pytorch focus on huge image related models, namely, CNN models. I think their test case is not general enough, but also can be a comparable sign.

@dzhwinter
Copy link
Contributor Author

dzhwinter commented Dec 13, 2017

如所提出的问题,讨论了以上5个方面的问题。防止英文翻译有偏,会议记录中文如下:

  1. benchmark选择的模型包括
  mnist VGG-19 Resnet50 Stacked-LSTM seq2seq
dataset mnist flowers/imagenet cifar100/imagenet imdb machine translation dataset
reason The easist case Extreme big CNN model Most used image model The easist RNN model popular attention model

分别选取了图像领域的经典模型VGG, Resnet,属于CNN,和NLP领域的经典模型stacked-lstm, seq2seq,属于RNN。
以上模型需要在每次发版后做基准测试,并发布数据指标,和TensroFlow对比曲线。

benchmark的镜像为当次发版后paddlepaddle/paddle:VERSION
benchmark的代码库目前为https://github.com/dzhwinter/benchmark, 将会迁移到paddlepaddle

  1. benchmark选取数据指标
  test accuracy instance/second GPU memory size*
batch size      
Fluid      
TensorFlow      

benchmark需要和TensorFlow对齐指标,其中表格中test accuracy是迭代相同轮数后的结果,instance/second衡量训练速度,GPU memory size不同架构可能不同,作为参考。

  1. 提供极端模型Test测试框架
  • 对框架性能方面:
    vnet. 医疗图像的image segementation, 大量使用conv, conv_transpose, 占用内存高,业务线的痛点。
    Resnet1000. 占用内存高,计算量大。测试计算速度,内存优化。
  • 对框架表达能力方面:
    Tree-Based RNN, 实现难度高,考虑下阶段实现
    其他变长RNN的典型模型,宣传优势
  1. benchmark的数据筛选问题 @pkuyym
    分为两个方面,
  • 对厂外:数据公开且有大量公开指标。例如图像领域的limagenet,cifar100,RNN
  • 对厂内:找典型业务线数据合作,攻下痛点。另厂内数据大,关注多GPU,多机进度。
    厂内最近有公开数据,考虑合作: http://ai.baidu.com/broad
  1. 亮点应用
    变长RNN在视频数据上的应用。理由:目前视频领域基准少,延续Paddle的RNN速度优势,LoDTensor表达RNN简单,视频领域足够大。

  2. op 测试框架
    和TensorFlow的单个op对齐的框架,一侧Fluid,一侧TensorFlow,feed data,完全对齐op结果。主要目标是对模型的debug,以及对业务线更有说服力。

NOTES:
TensorFlow首次构造模型速度慢,Fluid有优势,可以作为对比点。
需要对比TensoFlow eager execution

@pkuyym
Copy link
Contributor

pkuyym commented Dec 13, 2017

@dzhwinter dzhwinter changed the title chapters convergence validation Fluid benchmark & book validation Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants