-
Notifications
You must be signed in to change notification settings - Fork 5.6k
2018 04 25
Tao Luo edited this page Dec 9, 2019
·
1 revision
- Add issue to describe the goal of merging all the build related scripts (https://github.com/PaddlePaddle/Paddle/issues/10073)
- Add "README" file to describe how to use the new build scripts (https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/README.md)
- Build system: https://github.com/PaddlePaddle/Paddle/pull/10038#issuecomment-384045187
- Fluid language embedding: https://github.com/PaddlePaddle/Paddle/issues/10152#issuecomment-384084262
- reader.BlockingQueue: https://github.com/PaddlePaddle/Paddle/pull/10206#pullrequestreview-115331948
- build script polishment: https://github.com/PaddlePaddle/Paddle/issues/10073#issuecomment-384465044
- Paddle Fluid API design with @cs2be , @abhinavarora and @varunarora
- Fluid API proposal: https://github.com/PaddlePaddle/Paddle/issues/10152 (coauthor with Thuan)
- Word to Vec under the new Fluid API: https://github.com/PaddlePaddle/Paddle/issues/10214
- Fluid data pipeline interface: https://github.com/PaddlePaddle/Paddle/issues/10102
- Related discussion: https://github.com/PaddlePaddle/Paddle/issues/10177
- Reviews
- Code Clean Up:
- Clean up unused code in operator class (#10035)
- Remove duplicated ShareLoD in gru_op and sequence_conv_op (#10149)
- Paddle Fluid API design
- Fluid api bug and dead-links fix:
- Change paddle.v2.dataset to paddle.dataset for V2 docs:https://github.com/PaddlePaddle/Paddle/pull/10222
- Add dataset for fluid api documentation: https://github.com/PaddlePaddle/Paddle/pull/10172
- [WIP]Fix most of deadlinks in fluid documentation:https://github.com/PaddlePaddle/Paddle/pull/10097
- Update some new apis:https://github.com/PaddlePaddle/Paddle/pull/10051
- PR review:
Directory | Develop | 15-Mar |
---|---|---|
fluid/framework | 29 | 227 |
fluid/framework/details | 0 | 5 |
fluid/inference | 0 | 15 |
fluid/inference/tensorrt | 14 | N/A |
fluid/memory | 0 | 2 |
fluid/operators | 0 | 303 |
fluid/operators/reader | 0 | 8 |
fluid/operators/concurrency | 0 | N/A |
fluid/operators/math | 328 | 369 |
fluid/operators/detail | 0 | 29 |
fluid/operators/nccl | 2 | 2 |
fluid/platform | 0 | 155 |
fluid/pybind | 0 | 41 |
fluid/recordio | 0 | 18 |
fluid/string | 0 | 7 |
- Code Cleanup
- Fix more CPPLint errors https://github.com/PaddlePaddle/Paddle/pull/10218
- Fix CPPLint errors with framework/executor https://github.com/PaddlePaddle/Paddle/pull/10212
- Fix CPPLint errors with framework/op_desc https://github.com/PaddlePaddle/Paddle/pull/10181
- Fix CPPLint issues in framework/data_transform framework/prune.cc https://github.com/PaddlePaddle/Paddle/pull/10178
- Fix CPPLint issues in init.cc, init.h and library_type.h https://github.com/PaddlePaddle/Paddle/pull/10148
- Fix Cpplint issues in framework/data_type.h and framework/feed_fetch_type.h https://github.com/PaddlePaddle/Paddle/pull/10146
- Fix CPPLint issues in tensor_util_test https://github.com/PaddlePaddle/Paddle/pull/10111
- Fix CPPLint errors in framework/details https://github.com/PaddlePaddle/Paddle/pull/10104
- Fix CPPlint issues in fluid/inference https://github.com/PaddlePaddle/Paddle/pull/10075
- Fix CPPLint issues with select_op https://github.com/PaddlePaddle/Paddle/pull/10072
- Fix more CPPLint errors https://github.com/PaddlePaddle/Paddle/pull/10069
- Fix CPPLint issues in some tests in fluid/framework https://github.com/PaddlePaddle/Paddle/pull/10068
- Fluid API V4
- Participate in discussions with Helin and Thuan on V4 API design
- https://github.com/PaddlePaddle/Paddle/issues/10152#issuecomment-384030871
- Recognize Digits example with new API https://github.com/PaddlePaddle/Paddle/issues/10215
- PR Reviews
- https://github.com/PaddlePaddle/Paddle/pull/10226#pullrequestreview-115452448
- https://github.com/PaddlePaddle/Paddle/pull/10211#pullrequestreview-115363430
- https://github.com/PaddlePaddle/Paddle/pull/10179#pullrequestreview-114956631
- https://github.com/PaddlePaddle/Paddle/pull/10172#pullrequestreview-114750765
- https://github.com/PaddlePaddle/Paddle/pull/10105#pullrequestreview-114131741
- https://github.com/PaddlePaddle/Paddle/pull/10097#pullrequestreview-114092369
- https://github.com/PaddlePaddle/Paddle/pull/10090#pullrequestreview-114179699
- https://github.com/PaddlePaddle/Paddle/pull/10070#pullrequestreview-113812928
- https://github.com/PaddlePaddle/Paddle/pull/10051#pullrequestreview-113831122
- worked on the aws training issue, Yanxu's helping on this https://github.com/PaddlePaddle/Paddle/issues/10106
- Aws tool to quickly switch on/off an instance for external user https://github.com/putcn/aws_instance_switch
- Aws tool doc improved https://github.com/PaddlePaddle/Paddle/pull/10182
- float16 inference:
- Add float16 inference transpiler, fix a bug in prune method, and add image classification float16 inference example: https://github.com/PaddlePaddle/Paddle/pull/10109
- Add float16 inference design doc: https://github.com/PaddlePaddle/Paddle/pull/10210
- [WIP] float16 inference experiment report
- PR review:
- Added kernel to beam_search_op :
- Adding kernel to beam_search_decode_op
- Debug Se-resnext layer-by-layer
- non-deterministic elementwise-grad-op: https://github.com/PaddlePaddle/Paddle/issues/10122
- non-deterministic conv2d-grad, batch-norm-grad
- Research and outline ONNX fully-support technical difficulties
- Debug cuda-8-cudnn5 docker image transformer model crash issue
- followup on the new api design
- inference:
- tensorrt design doc: https://github.com/PaddlePaddle/Paddle/issues/10028
- refine tensorrt cmake and dockerfile: https://github.com/PaddlePaddle/Paddle/pull/10134
- tensorrt convert init: https://github.com/PaddlePaddle/Paddle/pull/10144
- fix a bug in test_batch_norm_op.py: https://github.com/PaddlePaddle/Paddle/pull/10094
- fix a cpu bug in parallel_executor.py: https://github.com/PaddlePaddle/Paddle/pull/10141
- code review:
- fea/init tensorrt engine: https://github.com/PaddlePaddle/Paddle/pull/10003#pullrequestreview-113556181
- [merge] multiplication operator for MKLDNN: https://github.com/PaddlePaddle/Paddle/pull/9949
- MKLDNN implementation of batch normalization: https://github.com/PaddlePaddle/Paddle/pull/9904
- dist train accuracy/perf data updates: https://docs.google.com/spreadsheets/d/1D5Xc_TfGfMV5aKh4ZJS_b4js3Mnn06H1Po0iuECZLr4/edit#gid=0
- Multi GPU dist train:
- [WIP] some NCCL2 dist prototype: https://github.com/typhoonzero/nccl_rdma_demo
- Reviews and discussions of async dist training
- Get familiarity with op development process, profile and timeline
- Optimize iou_similarity_op cuda kernel:
- dist train accuracy https://github.com/seiriosPlus/fluid_benchmark/tree/master/image_classification
- MPI-Enabld https://github.com/seiriosPlus/mpi_enabled
- Add synchronous TensorCopy:
- fix Clang compile errors:
- BlockingQueue for readers
- Reviews:
- Fix a critical bug of dynloader
- We use dlsym to extract function pointer from shared library(
dynload
namespace). We cast the pointer to the type that exactly fit the invoke parameter, not the actually function type defined in header.- for example, if we pass an (int, int) to a function void((int64_t, int64_t)). We will cast the function symbol to void((int, int)), rather than void(*(int64_t, int64_t)). It will cause bug if sizeof(int) != sizeof(int64) on some platform.
- https://github.com/PaddlePaddle/Paddle/pull/10191
- https://github.com/PaddlePaddle/Paddle/pull/10189
- We use dlsym to extract function pointer from shared library(
- Find a critical bug of GPU memory allocator and memcpy
- We found that we cannot synchonize stream if we invoke cudaMemcpyAsync on a CPU memory, which is allocated by
malloc
notcudaMallocHost
. It is suggest to usecudaMallocHost
to malloc CPU memory, when the memory is used for CPU <--> GPU communication. - When we change
malloc
tocudaMallocHost
, we found that there are a lot of memory copies are not synchonized. It is a critical bug for Paddle and a key reason making our training process not stable. - Currentlly, we add
cudaMemcpySync
API to avoid the bug when feeding/fetching data. To resolve this bug thoroughly, it will take a week or longer.
- We found that we cannot synchonize stream if we invoke cudaMemcpyAsync on a CPU memory, which is allocated by
- Add a demo for parallel execturo + reader to train and test a program
- I meet three problems in traning transformer model:
https://github.com/PaddlePaddle/Paddle/pull/10220
- memory
- training speed
- GPU memory
- Performance of framework:
- Debug backward of OCR attention model
- Add bias for gru_unit_op and fix activation function
- Fix edit dis: https://github.com/PaddlePaddle/Paddle/pull/10090
- OCR inference
- Add init interface for customize devices.
- Fix OCR CTC model:
- Image:
- Debug SE-ResNeXt with ParallelExe:
- example: https://github.com/qingqing01/PaddleDemo/tree/master/fluid/se_resnext
- https://github.com/PaddlePaddle/Paddle/issues/10204
- Do some experiments, but the results needs 1 week.
- Memory usage setting in inference for OCR
- Set fraction_of_gpu_memory_to_use=0 in arguments, but a lot of performance drops.
- Work discussion: http://wiki.baidu.com/pages/viewpage.action?pageId=486606749
- SSD enhance: https://github.com/PaddlePaddle/models/pull/869
- Debug SE-ResNeXt with ParallelExe:
- Code review:
- Roi Pooling: https://github.com/PaddlePaddle/Paddle/pull/10169
- Add init interface for customize devices. https://github.com/PaddlePaddle/Paddle/pull/10167
- Fix elementwise_gradient bug. https://github.com/PaddlePaddle/Paddle/pull/10150
- https://github.com/PaddlePaddle/models/pull/878
- COCO dataset and Doc: https://github.com/PaddlePaddle/models/pull/844
- SE-ResNeXt: https://github.com/PaddlePaddle/models/pull/825#pullrequestreview-115425183
- PR
- Feature/insert reduce_op to parallel exe
- Fix elementwise_gradient bug
- Enable delay op feature
- Feature/add reduce op handle
- Fix scope of gather and broadcast, and code clean
- benchmark se-resnet50 with @qingqing and @panxin
- Review
- Inference Framework
- Add flush of program desc to update the proto information
- Build the docker image paddle_manylinux_devel:cuda8.0_cudnn7 and build the latest inference library for image collegues
- Analysis the reason of that the runtime of setting fraction_of_gpu_memory_to_use=0 is 3~4x to the default setting (0.92)
- Review
- Add init interface for customize devices, https://github.com/PaddlePaddle/Paddle/pull/10167
- init tensorrt engine, https://github.com/PaddlePaddle/Paddle/pull/10003
- Mobile
- Refine reader
https://github.com/guoshengCS/transformer-nist/blob/refined_data_reader/transformer/data_util.py - Refine argument naming
https://github.com/PaddlePaddle/Paddle/pull/10223 - Tuning Transformer
- Speed up inference: 40+m —> 10+m
-
fluid support async training
- project: https://github.com/PaddlePaddle/Paddle/projects/61
- task list:https://github.com/PaddlePaddle/Paddle/issues/9941
- FLuid support async training
- VariableResponse support deserialize var into local scope https://github.com/PaddlePaddle/Paddle/pull/10060
- Refine listen and serve op https://github.com/PaddlePaddle/Paddle/pull/10080
- split optimization ops on pserver to independenty blocks https://github.com/PaddlePaddle/Paddle/pull/10123
- [WIP]listen_and_serv_op support async update https://github.com/PaddlePaddle/Paddle/pull/10042
- Run test on text_classification of async training
-
code clean and improvement
- fix build activation_op.cc on mac https://github.com/PaddlePaddle/Paddle/pull/10116
- do more benchmark about async training
- lookup remote table
- lookup table with nonexistent key, https://github.com/PaddlePaddle/Paddle/pull/10164
- confirm text classification model acc with distributed training, https://docs.google.com/spreadsheets/d/1D5Xc_TfGfMV5aKh4ZJS_b4js3Mnn06H1Po0iuECZLr4/edit#gid=1478737887
- review
- [Speed] change Scope string hashed variable index to number hashed
- upgrade to cuda9 cudnn 7
- Model CE
- teamcity, Model CE搭建完毕,1master,2 agent
- 修复gpu memory统计错误,采样粒度,Model CE访问等l问题
- NLP transfomer模型,图像ocr, image_classifaction, object_detection 4个模型已加入,正在观察稳定性
Fluid2onnx convertor:
- Add unit test framework for operators' conversion
- Resovle name conflicts, operators enhancements & add dropout, elem_mul, sigmoid ops etc.
- Add vgg16 & resnet50 to supported models
- Add mobilenet & se_resnext to supported models
- [WIP] Add Inception_v4 config in models/fluid/image_classification
- Enable the parallel training of mobilenet
- Merge design doc for onnx convertor
- NMT:
- Transformer code clean and data utility.
- Transformer experiments related.
-
inference
-
CE
-
doc
-
WIP: create an online VisualDL demo server to give users first hand experience:
-
VisualDL improvement:
-
Reviewed PRs:
- https://github.com/PaddlePaddle/VisualDL/pull/428
- https://github.com/PaddlePaddle/VisualDL/pull/425
- https://github.com/PaddlePaddle/VisualDL/pull/424
- https://github.com/PaddlePaddle/VisualDL/pull/422
- https://github.com/PaddlePaddle/VisualDL/pull/420
- https://github.com/PaddlePaddle/VisualDL/pull/416
- https://github.com/PaddlePaddle/VisualDL/pull/413
- Paddle
- Imperative Design
- Paddle API v4 proposal (https://github.com/PaddlePaddle/Paddle/issues/10152)
- Paddle V4 API - Recognize Digits (https://github.com/PaddlePaddle/Paddle/issues/10215)
- Reviews
- Imperative Design
-
Code cleanup:
- PR: https://github.com/PaddlePaddle/Paddle/pull/10105
- PR: https://github.com/PaddlePaddle/Paddle/pull/10211
- Review: https://github.com/PaddlePaddle/Paddle/pull/10104
- Review: https://github.com/PaddlePaddle/Paddle/pull/10075
- Review: https://github.com/PaddlePaddle/Paddle/pull/10148
- Review: https://github.com/PaddlePaddle/Paddle/pull/10212
- Review: https://github.com/PaddlePaddle/Paddle/pull/10218
-
Imperative Fluid (With Helin and team):
-
ONNX: review: https://github.com/PaddlePaddle/paddle-onnx/pull/30
-
Working with Sharan on sentiment analysis model benchmark (using paddle)
-
VisualDL
- Update VisualDL documentation structure on PPO. Add new documentations to the website.: https://github.com/PaddlePaddle/VisualDL/pull/416
- Update embedding search experience: https://github.com/PaddlePaddle/VisualDL/pull/420
- Only allow one embedding record per run: https://github.com/PaddlePaddle/VisualDL/pull/422
- Update embedding API documentation. Create in-house dimension reduction functions: https://github.com/PaddlePaddle/VisualDL/pull/424
-
PaddlePaddle.org
- Fix the issue where the doc tool can't generate documentation: https://github.com/PaddlePaddle/PaddlePaddle.org/pull/470
- Update VisualDL doc generating setting to consume the new layout: https://github.com/PaddlePaddle/PaddlePaddle.org/pull/471/files
-
Reviews and issues
- Use VisualDL on Paddle Demo
- Image classify Demo https://github.com/PaddlePaddle/VisualDL/pull/425
- Step by step tutorial document
- PRs