-
Notifications
You must be signed in to change notification settings - Fork 5.6k
2018 04 11
- [WIP] Refactor all the build related shell scripts, centralize the parameter control into one script file and run different build jobs using options.
- [WIP] Write scripts to control docker container startup option.
-
MKLDNN code
-
Code quality
- Unify Fluid C++ style: https://github.com/PaddlePaddle/Paddle/pull/9685
- Try to fix cpplint errors of fluid/recordio:
-
20 pull requests to clean up code: https://github.com/pulls?utf8=✓&q=is%3Apr+author%3Awangkuiyi
- Code Clean up: https://github.com/PaddlePaddle/Paddle/pull/9663
- Removal of NetOp, CondOp, backward.cc.
- Refactor test_batch_norm.py and test_layer_norm.py
- Code Cleanup
- Fix CPPLint errors in Fluid Operators https://github.com/PaddlePaddle/Paddle/issues/9755
- Fix CPPLint issues in tuple.h https://github.com/PaddlePaddle/Paddle/pull/9670
- [WIP] Fix CPPLint issues in CSP operators https://github.com/PaddlePaddle/Paddle/pull/9753
- Fix comparison warning in lod_reset_op.h https://github.com/PaddlePaddle/Paddle/pull/9754
- Fix CPPlint issues in some operators https://github.com/PaddlePaddle/Paddle/pull/9776
- Fix CPPLint issues in spp_op, sum_op, topk_op, transpose_op, unpool_Op and warpctc_op https://github.com/PaddlePaddle/Paddle/pull/9779
- Fix CPPLint errors in operators https://github.com/PaddlePaddle/Paddle/pull/9826
- Fix CPPLint errors in operators https://github.com/PaddlePaddle/Paddle/pull/9828
- PR Reviews
- https://github.com/PaddlePaddle/Paddle/pull/9837#pullrequestreview-111351187
- https://github.com/PaddlePaddle/Paddle/pull/9829#pullrequestreview-111087051
- https://github.com/PaddlePaddle/Paddle/pull/9795#pullrequestreview-110560107
- https://github.com/PaddlePaddle/Paddle/pull/9788#pullrequestreview-110360386
- https://github.com/PaddlePaddle/Paddle/pull/9786#pullrequestreview-110361070
- https://github.com/PaddlePaddle/Paddle/pull/9783#pullrequestreview-110357406
- https://github.com/PaddlePaddle/Paddle/pull/9717#pullrequestreview-110270818
- https://github.com/PaddlePaddle/Paddle/pull/9715#pullrequestreview-110265964
-
PR
- AWS dist train tool https://github.com/PaddlePaddle/Paddle/pull/9638
-
Review
- dist train doc https://github.com/PaddlePaddle/Paddle/pull/9789
- multi gpu distributed training: https://github.com/PaddlePaddle/Paddle/pull/9746
- fix python packaging bug: https://github.com/PaddlePaddle/Paddle/pull/9807
- update grpc: https://github.com/PaddlePaddle/Paddle/pull/9863
- fix transpiler bug: https://github.com/PaddlePaddle/Paddle/pull/9741
- release doc update: https://github.com/PaddlePaddle/Paddle/pull/9729
- port bind for dist train unit test: https://github.com/PaddlePaddle/Paddle/pull/9595
- multi stream thread pool: https://github.com/PaddlePaddle/Paddle/pull/9578
- enhancements [Doing]:
- inference:
- fuse batch norm: https://github.com/PaddlePaddle/Paddle/pull/9792, 9%~13% speedup on resnet
- add remove_var, remove_op on C++ end and Python end:
- discuss about design of sync_with_cpp with @wuyi @qiaolongfei: https://github.com/PaddlePaddle/Paddle/pull/9607#discussion_r179066586
- TensorRT plan: https://github.com/PaddlePaddle/Paddle/issues/9572
- fuse batch norm: https://github.com/PaddlePaddle/Paddle/pull/9792, 9%~13% speedup on resnet
- compiler:
- change WITH_FLUID to WITH_FLUID_ONLY: https://github.com/PaddlePaddle/Paddle/pull/9427
- fix compiler error of profiler_test in ONLY_CPU mode: https://github.com/PaddlePaddle/Paddle/pull/9531
- fix compiler error on
tensor_py.h
: https://github.com/PaddlePaddle/Paddle/pull/9724 - remove unused nccl.cmake: https://github.com/PaddlePaddle/Paddle/pull/9833
- code review:
- MKLDNN:
- inference:
- image_classification
- doc
- bug
Vgg16 imagenet on V100 GPU, total time for one batch (ms):
batch size | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|
float32 | 14.64 | 10.24 | 23.54 | 28.41 | 53.62 |
float16 | 3.94 | 4.62 | 6.21 | 9.39 | 15.82 |
Speedup | 3.72 | 2.22 | 3.79 | 3.03 | 3.39 |
Vgg16 imagenet on V100 GPU, time spent on conv op (ms):
batch size | 1 | 2 | 4 | 8 | 16 |
---|---|---|---|---|---|
float32 | 12.0 | 6.96 | 18.6 | 21.4 | 41.3 |
float16 | 1.81 | 2.11 | 2.95 | 4.57 | 8.0 |
Speedup | 6.63 | 3.30 | 6.31 | 4.68 | 5.16 |
- float16 support
- benchmark float16 inference on imagenet: https://github.com/kexinzhao/Paddle_benchmark/blob/master/float16_benchmark.md
- enable tensor core for GEMM: https://github.com/PaddlePaddle/Paddle/pull/9622
- enable tensor core for conv op: https://github.com/PaddlePaddle/Paddle/pull/9623
- add float16 support to softmax op: https://github.com/PaddlePaddle/Paddle/pull/9686
- add float16 support to activation ops: https://github.com/PaddlePaddle/Paddle/pull/9769
- fix cuda 7.5 compile error: https://github.com/PaddlePaddle/Paddle/pull/9811
- add float16 support to save op and add float16 example code: https://github.com/PaddlePaddle/Paddle/pull/9864
-
Fluid support Abacus
-
Fluid implementation:
- Dist transpiler support prefetch https://github.com/PaddlePaddle/Paddle/pull/9714
- add insert_op for block https://github.com/PaddlePaddle/Paddle/pull/9765
-
Code clean&optimize
- fix missing core.so on mac https://github.com/PaddlePaddle/Paddle/pull/9725
- change mklml download url to bce https://github.com/PaddlePaddle/Paddle/pull/9652
documents:
PR:
-
Add title for kernel_hint_design.md & kernel_selection.md
-
Fix api docs display error for fluid Initializer
-
Fix display errors for images and tables in .md file:
-
Fix some dead links for fluid documents
-
Add contents for manully build documentation
issue:
-
All deadlinks in fluid documentation
-
Error occurs when building apis or documentation
- Add a ParallelExecutor scheduling optimization. 4 device speedup on resnext improve 14% https://github.com/PaddlePaddle/Paddle/pull/9548
- Add data feed for ParallelExecutor https://github.com/PaddlePaddle/Paddle/pull/9637
- Explore distributed training codes. https://github.com/PaddlePaddle/Paddle/pull/9735
- cleanup https://github.com/PaddlePaddle/Paddle/pull/9678 https://github.com/PaddlePaddle/Paddle/pull/9699 https://github.com/PaddlePaddle/Paddle/pull/9750
- ParallelExecutor
- Parallel testing during training by ParallelExecutor. https://github.com/PaddlePaddle/Paddle/pull/9738
- Support data type int64 in NCCL. https://github.com/PaddlePaddle/Paddle/pull/9818
- Improve test_parallel_executor. https://github.com/PaddlePaddle/Paddle/pull/9849
- Image:
- PriorBox GPU kernel: https://github.com/PaddlePaddle/Paddle/pull/9553
- Enable ParallelExecutor in SSD-MobileNet and Refine code. https://github.com/PaddlePaddle/models/pull/832
- SE-ResNeXt with ParalleExecutor: https://github.com/PaddlePaddle/models/pull/816
- Doc for SSD: https://github.com/PaddlePaddle/models/pull/801
- Others:
- Code cleanup in the profiler code. https://github.com/PaddlePaddle/Paddle/pull/9782
- https://github.com/PaddlePaddle/models/pull/821
- Inference Framework
- [Merged] Remove the use of ARCHIVE_START/END
- Test the speedup of merging the computation of batch norm op on resnet50, nearly 9% ~ 13% performance gain
- Update the documentation of inference
- Distributed transformer:
- Docstring checker style:
- Fix debuger bugs:
-
Fix average_accumulate_op for parallel_executor
-
Fix lost of LoD while splitting tensor in parallel_executor.
-
Implement OCR CTC parallel training by parallel_executor.
-
Refine document and scripts of CTC model
-
Review:
- lookup remote table
- support prefetch interface on gRPC server, https://github.com/PaddlePaddle/Paddle/pull/9593
- init Table value Op, https://github.com/PaddlePaddle/Paddle/pull/9787
- doc
- translate k8s dist train doc, https://github.com/PaddlePaddle/Paddle/pull/9789
- review
- WIP
- async update on distributed training.
- NMT:
- Decouple the program desc with batch_size in Transformer(Merged).
- Refine the inference to output special tokens optionally in Transformer(Merged).
- Remove the pad token in Transformer(Merged).
- Transformer experiments related.
- Add plot script for Transformer
- Add evaluation tools for Transformer
- Add Ci for onnx converter
https://github.com/PaddlePaddle/paddle-onnx/pull/15
https://github.com/PaddlePaddle/paddle-onnx/pull/13
https://github.com/PaddlePaddle/paddle-onnx/pull/8
- Modify readers to fit the parallel executor:
- [WIP] Test double buffer performance on transformer model:
- single GPU: 80.8 --> 67.9
- Reviews:
- metrics: https://github.com/PaddlePaddle/Paddle/pull/9791
- updates on parallel executor:
- [Memory] reuse relu/sigmoid operator input variable, to save memory cost
- refactor metrics, add auc, detection map, evaluators
- migrate from benchmark to main repo
- migration from benchmark repo to paddle
- PR
- feature/Add Broadcast and Gather op handle
- Add all gather op and all reduce op
- Refine SE-ResNeXt model and use ParallelExecutor.
- Crash training, if the number of samples is less than the count of devices.
- Move reduceSum to elementwise_op_function.h
- Review
- Enable ParallelExecutor in SSD-MobileNet and Refine code.
- Refine SE-ResNeXt model and use ParallelExecutor.
- Simplify DataStructure in SSAGraph
- remove net op and cond_op
- Speed/sequence expand
- fix python package have no version.py
- Add float16 support to activation ops
- Fix cpplint errors with paddle/fluid/memory
- fix test_conv2d_op when compile without cuda
- Fix CPPLint issues in tuple.h
- [WIP] Learning the NLP model word2vec and NMT
- [WIP] Learning the basic logic of operator and layer
- [WIP] Trying to complete the implementation of Yaming's PR
- Turn on WITH_FLUID_ONLY on TeamCity CI, build time improved 3 min.
- Begin to analyze multiple node communication bottleneck.
- Reviews
- https://github.com/PaddlePaddle/Paddle/pull/9638#pullrequestreview-110544500
- https://github.com/PaddlePaddle/Paddle/pull/9854
- https://github.com/PaddlePaddle/Paddle/pull/9663
- https://github.com/PaddlePaddle/Paddle/pull/9848#pullrequestreview-111332997
- https://github.com/PaddlePaddle/Paddle/pull/9802#pullrequestreview-110634050
- https://github.com/PaddlePaddle/edl/pull/14#pullrequestreview-110969257
- https://github.com/PaddlePaddle/edl/pull/18#pullrequestreview-110639654
DeepASR:
- Some minor fixes
- Add the error rate measuring script
Fluid2ONNX convertor:
- Add script for accuracy validation
- Enable to load parameters & convert fit_a_line model
- Fix problems in travis-ci
- [WIP] Support conversion of recognize_digits_conv model
Code Review:
- https://github.com/PaddlePaddle/models/pull/791
- https://github.com/PaddlePaddle/models/pull/834
- https://github.com/PaddlePaddle/paddle-onnx/pull/8
[WIP] Coordinate 1.0.0 release.
- PRs
- Use ESLint to format the Javascript and Vue files:https://github.com/PaddlePaddle/VisualDL/pull/365
- Include ESLint pre-commit: https://github.com/PaddlePaddle/VisualDL/pull/368
- Manually update the code to match ESLint style: https://github.com/PaddlePaddle/VisualDL/pull/369
- Fix smooth not working bug: https://github.com/PaddlePaddle/VisualDL/pull/375
- Remove un-necessary debouce import: https://github.com/PaddlePaddle/VisualDL/pull/376
- Add text demo logs in scratch_log: https://github.com/PaddlePaddle/VisualDL/pull/381
- Add license agreement to Python files: https://github.com/PaddlePaddle/VisualDL/pull/383
- Code refactoring/cleanup:
- Issue wrt running pre-built docker: https://github.com/PaddlePaddle/Paddle/issues/9856
- Working on fixing sentiment analysis book chapter bug: https://github.com/PaddlePaddle/Paddle/issues/9886
PR review:
-
VisualDL Release
- Added SVG to PNG download functionality so that WYSIWYG (download), with CSS style integrity
- Tuned styles for interactive graph
- fixed logic in backend as we were previously using static image for graph generated by GraphViz but now we generated the graph SVG dynamically
- (WIP): add back slider as the graph can be large
-
PRs
- https://github.com/PaddlePaddle/VisualDL/pull/391
- https://github.com/PaddlePaddle/VisualDL/pull/389
- https://github.com/PaddlePaddle/VisualDL/pull/388
- https://github.com/PaddlePaddle/VisualDL/pull/387
- https://github.com/PaddlePaddle/VisualDL/pull/384
- https://github.com/PaddlePaddle/VisualDL/pull/380
- https://github.com/PaddlePaddle/VisualDL/pull/378
- https://github.com/PaddlePaddle/VisualDL/pull/370
- https://github.com/PaddlePaddle/VisualDL/pull/367
-
PRs reviewed:
- https://github.com/PaddlePaddle/VisualDL/pull/385
- https://github.com/PaddlePaddle/VisualDL/pull/382
- https://github.com/PaddlePaddle/VisualDL/pull/377
- https://github.com/PaddlePaddle/VisualDL/pull/366
- https://github.com/PaddlePaddle/VisualDL/pull/363
- https://github.com/PaddlePaddle/VisualDL/pull/360
- https://github.com/PaddlePaddle/VisualDL/pull/383
- https://github.com/PaddlePaddle/VisualDL/pull/381
- https://github.com/PaddlePaddle/VisualDL/pull/376
- https://github.com/PaddlePaddle/VisualDL/pull/375
- https://github.com/PaddlePaddle/VisualDL/pull/369
- https://github.com/PaddlePaddle/VisualDL/pull/368
- https://github.com/PaddlePaddle/VisualDL/pull/385
- https://github.com/PaddlePaddle/VisualDL/pull/365
- Complete audio data integration https://github.com/PaddlePaddle/VisualDL/pull/366
- Fix and add details of Image API https://github.com/PaddlePaddle/VisualDL/pull/377
- Fix audio demo by packaging wav file into proper location https://github.com/PaddlePaddle/VisualDL/pull/382
- Add and fix all unit tests https://github.com/PaddlePaddle/VisualDL/pull/385
- Working on ONNX fit_a_line testing and review, model validation: https://github.com/PaddlePaddle/paddle-onnx/pull/9, https://github.com/PaddlePaddle/paddle-onnx/pull/11
- PaddlePaddle.org needs assessment and subsequent issues: https://github.com/PaddlePaddle/PaddlePaddle.org/issues/455, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/456, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/457, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/458, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/459, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/460, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/461, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/462, https://github.com/PaddlePaddle/PaddlePaddle.org/issues/463
- Started working on design updates to PaddlePaddle.org
- Assisting Jeff on VDL current_thread import issues
- Began a draft of a product marketing strategy doc
- Communicated the Linux Foundation folks on outreach
- PTO last week
- Work with Sid on creating docker for benchmarks
- Look into how simplifying book examples using imperative would look like