-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Inference 2018 5
Tao Luo edited this page Dec 9, 2019
·
1 revision
- MKL dynamic link @tangjian下周会开始看
- parallelDo 的性能差于v2, 需要磨平吗
- 需要push NLP上线了吗
@tangjian
- 现在v2的性能可以跟以前磨平了,但是fluid没有跟之前磨平。
- anakin开始复现我们的数据,线上用的v3,v4,5117等cpu,我们测的是V2的cpu。
- [WIP] 序列标注开源任务中,遇到的是预测速度比较慢的问题,正在看,接口@焦振宇
- [Merged] add initial memory flag in MB for infer
- [Merged] Infer multi-threads API Demo and UT
- [WIP] fix unknown use_mkldnn flag
- [WIP] scope thread safe
- Merged, MKLDNN layout: Support for pool operator
- Merged, MKLDNN layout: Support for convolution operator
- Merged, MKLDNN layout: Support for batch norm operator
- ResNet50在6148上的训练和预测性能,与Intel Team对齐(QA)@chengsi 先拿一个短期的机器
- inference示例加入repo并能在ci上验证 https://github.com/PaddlePaddle/Paddle/issues/10990#issuecomment-393034634
- 将contrib/inference_api打包并部署
- 6148 机器
- QA 人力不够
- Train CPU with multi-thread @luotao 参考parallel do
-
TODO Inference with multi-thread @tangjiandone
- 6148机器 ready
- MKLDNN 7.5 的 milestone 可以设成与 V2 对齐
- CPU 训练多线程尝试下 ParallelDo
- Inference 7.5 目标
- 高层 API 及文档
- 原生实现
- Anakin 集成
- 子图初步集成 TensorRT
- CPU 两个模型达到上线标准
- 子图初步完成
- 目标为 MLP ,争取 resnet
- 图像 4个 inference demo
- 高层 API 及文档
Inference engine && op 整体优化
- 高层API 60%
- DONE 确定高层接口
- DONE 原生实现
- DOING 阿拉金
- 子图 30%
- DOING framework
- DOING TRT 支持
- MKLDNN 30%
- CPU核心模型优化 70%
- DOING OCR CPU
- DOING 情感分类 CPU
- 文档 && CI 40%
- DONE 旧接口文档初步
- TODO 新接口文档
- [Merged] Blas optimized elementwise_add forward and backward passes (10% speedup on elemtwise_add op of OCR CRNN_CTC model): https://github.com/PaddlePaddle/Paddle/pull/10913 @intel-team
- [Merged] Top K algorithm parallel version: https://github.com/PaddlePaddle/Paddle/pull/10941 @intel-team
- [Doing] ResNet50 benchmark on fluid (MKLML version first) @intel-team
- [Review] Withdraw MKLDNN Mul operator https://github.com/PaddlePaddle/Paddle/pull/10703 @intel-team
- [Merged] speedup vInvSqrt vLogqp vTanh with mklml in V2:https://github.com/PaddlePaddle/Paddle/pull/10934 @tangjian
- [Merged] fix inference_lib_dist deps:https://github.com/PaddlePaddle/Paddle/pull/10988 @tangjian
@panxin
@chunwei
- doing, 新版高层接口接入阿拉金(阿拉金外网无 library,暂时手动拷贝验证)
- doing, 新版高层接口fix + 文档
- open, fc converter
- merged, tensorrt engine op
- merged, mul converter
- OCR: 使用mklml动态库会导致其他任务coredump @luotao @visualization-team
- [已验证生效] 因为在使用MKLDNN和MKLML的时候会需要iomp.so,所以线上也必须要链接iomp。但是线上有很多服务已经使用了gomp,我们需要推荐用户直接使用iomp,链接方式是:
target_link_libraries(${TARGET_NAME} "-L${MKLML_LIB_DIR} -liomp5 -Wl,--as-needed")
@tangjian @yangjingyuan-OCR - [正在进行中] 依然存在mklml动态库与其他任务的mkl静态库冲突的问题。尝试使用MKL大包中的
-lmkl_core -lmkl_intel_lp64 -lmkl_sequential
进行编译,尝试v1版本的情况。@luotao @yangjingyuan-OCR @lixuan-OCR - [正在沟通中] 向Intel MKL组的 ying.hu@intel.com 寻求单线程和多线程版的静态MKLML库
- [已验证生效] 因为在使用MKLDNN和MKLML的时候会需要iomp.so,所以线上也必须要链接iomp。但是线上有很多服务已经使用了gomp,我们需要推荐用户直接使用iomp,链接方式是:
- 情感分类@tensor
- 正在出性能报告
- TeamCity上打印Git Commid Id @yanxu @luotao https://github.com/PaddlePaddle/Paddle/pull/10991
- 高层API 正式发布(接口稳定,文档基本ready) @chunwei @panxin
- 子图人工配置整体跑通 @chunwei
- CPU核心模型出性能报告以及解决线上bug(联合 Intel团队)@luotao @tensor
- 下周提供高层API的release note @chunwei
- how to sync the status, using Omniplan one by one?
- MKLDNN bi-weekly time
- when @guochaorong or someone others could take over the CI deployment?
- Is the name of
make inference_lib_dist
suitable? since train in C++ end also use this commend: Add cpp trainer lib and demo - how about the application status of 6148 CPU?
- issue:
- what
save_inference_model
should do: https://github.com/PaddlePaddle/Paddle/issues/10803 - op clip in
save_inference_model
https://github.com/PaddlePaddle/Paddle/issues/10811 - global mkldnn flag: https://github.com/PaddlePaddle/Paddle/issues/10765
- what
- 确认和波兰团队的双周会议,暂定下周三下午一点为第一次会议。
- MKLDNN的时间规划,波兰团队邮件中说这周末给出(他们经理Marcin在外面参加会议)。
- we are merging internally the latest code, supporting MKL-DNN data layouts, elementwise_add and SUM to check, how are we doing with Resnet training/inference and to identify next bottlenecks.
- CRNN-CTC model shall also benefit from those upgrades. From OPs point of view, we will look at top_k layer and optimize algorithm by parallelizing it (as it is a sorting problem, MKL-DNN isn’t a good place to implement it)
- Merged add mkldnn to paddle lib @qiaolongfei
- Merged Reuse of pooling mkldnn primitives @intel team
- Merged Update activations for MKL-DNN @intel team
- Merged enable MKLDNN inference test @tangjian
- framework
- Merged inference/analysis/data flow graph @yanchunwei
- PR refine/data flow graph @yanchunwei
- subgraph test
- PR feature/mul converter @yanchunwei
- Merged Move contrib to paddle/ @yanchunwei
- PR feature/inference api demo impl @yanchunwei
- Merged add version and cmakecache in inference_lib @luotao
- Merged change CMAKE_INSTALL_PREFIX in inference_lib_dist to FLUID_INSTALL_DIR @luotao
- auto build and deploy fluid.tgz on TeamCity(cuda8.0_cudnn5_avx_mkl) @luotao @yanxu
存在问题:
- 包有多余路径:paddle/build/fluid_install_dir
- GIT COMMIT ID在CI中没有打出来
- 其他版本的部署
- Merged Add Inference doc for fluid @weixing 官网展示
- MKLDNN 如何与Intel团队沟通产出/承诺有疑惑,需要@wangyi确定
-
save_inference_model
需要接受targets 留空,默认输出所有逻辑(除了backward)
- Refactor inference documentation and deploy on the official website
- tracking the status of different branches
- tracking the status of deployment seven models
- the plan of MKLDNN, more clear milestones
- refine the GitHub/projects
- determine the date of the internal weekly meeting
- have a GitHub wiki for tracking the common questions and bugs
- Release the Inference Lib with a document containing version details such as the following information so that users can reproduce it
- commit id
- the compilation commands and flags
- a performance report (QA provides)
- Higher API to hide the underlying details (including concepts, third-party symbols)
建立文档机制,初期可以让 shanyi 帮把现有文档迁移到 paddle/doc
,同时开始跑通部署到 official site 的流程
每周三晚上 6:30~7:30,Inference内部讨论,增加远程协同的信息带宽
周会目的是,回顾过去一周的工作,讨论共享信息,以及确定下周大概的工作目标和方向
其中,会有如下流程
- 将需要讨论的内容汇总到 Need Discussion
- 讨论完,将会议纪要总结到 Weekly Status
- 明确上线的目标性能
- 确定模型优先级
- 每个模型,有沟通好完整的测试性能的方法,对应的测试数据等,自己可以复现业务线的指标
- 分析瓶颈,逐步优化
- 优化框架,Kernel 的重复创建可以cache下,这点在 cudnn和 mkldnn上比较明显,具体收益不太确定
- 高层API,与图像沟通,他们会按照我们现在的高层API,先将他们的接口实现贡献过来,上层用很薄一层封装;未来我们的版本发布需要对性能问题负责
- MKLDNN需要用单线程测试结果
- MKLML 对比 Openblas 有明显性能增益,可以考虑先替代 openblas 拿到 CPU的第一步收益
- 对于MKLDNN还未支持,但又是特定模型瓶颈的 OP,可以自己使用类似MKLDNN的方式优化(这点可能sys也是这么做的)
- 尽早完成子图测试验证