-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
5c7d9d9
to
950f07f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
728b43e
to
c7c5e98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* sharding pp overlap * bug fix * update * rename function * update code logic
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
* part-2 cherry from: add timer to pp (#53831) + sharding pp overlap (#54312) (#54360) * add timer to pp (#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312) * part-2 cherry from: [hybrid performance] Early grad fusion. (#54403) (#54443) * part-2 cherry from: 【new_frl】modify multiply_grad_node create (#54389) * modify multiply_grad_node create * modify build conflict * add place choose * ci segment fault * clear branch * part-2 cherry from: add assert overlap (#54559) * part-2 cherry from: pipeline model 移除 self.data (#54387) * polish * polish * polish * polish * polish * polish * part-2 cherry from : support sharding stage1 (#54069) * support sharding stage1 * fix unittest * format * pass sharded sharding params_and_grads to inner_opt apply_pptimize * change sharding gradient allreduce to reduce * support save state_dict adptively and support sharding with mp * fix sharding test * test set_state_dict * add more unit test * fix global norm of mp case * polish * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp * remove print * tiny fix * part-2 fix PR (#54389) * tinyfix * part-2 cherry from: Align VPP global norm clip with PP (#54820) * part-2 cherry from: Align VPP global norm clip with PP (#54820) * part-2 cherry from: Make FLAGS_force_align_vpp_grad_sum_order default to false (#54937) * make FLAGS_force_align_vpp_grad_sum_order default to false * polish code * part-2 cherry from: support add(x_float32, bfloa16_) or add(x_float32, y_float16) (#54611) * support add(x_float32, bfloa16_) or add(x_float32, y_float16) * polisg * tinyfix * part-2 cherry from: fix bug of pp (#54831) * fix in_place ut failed caused by PR #54389 * fix comments * fix comments * fix comments * part-2 cherry from: add perf test api to fleet (#54856) * part-2 cherry from: [Distributed] Opt nccl connection by lazy initialization (#55005) * part-2 cherry from: refine dygraph_sharding_optimizer.py by sorting parameters (#55059) * refine dygraph_sharding_optimizer.py by sorting parameters * Update dygraph_sharding_optimizer.py Make FLAGS_sharding_sort_parameters=1 by default. * part-2 cherry from: Fix hybrid_parallel_sharding_model.py ut (#55269) * fix hybrid_parallel_sharding_model.py * Update hybrid_parallel_sharding_model.py * part-2 cherry from: add fleet test tools. (#55701) * part-2 cherry from: add device synchronize for p2p (#55461) * part-2 cherry from: new_frl_shard_reduce (#55353) * new_frl_shard_redece * add mp guard * add test * part-2 cherry from: add paddle.async_save to reduce time cost by checkpoint saving (#55115) * add paddle.async_save to reduce time cost by checkpoint saving * adapt save_for_auto_inference to paddle.async_save * modify UT * modify UT * fix on cpu only version * revert commit on save_auto_inference * fix threading * remove duplicate in topology.py and dygraph_sharding_optimizer.py * part-2 tinyfix cherry-pick * fix ci * fix test_parallel_dygraph_sharding_parallel * rename perf_test to collective_test, collective_xx_test to collective_xx_perf * fix comments: remove useless logic * reform fleet.collective_perf API && add doc strings * fix PR-CI-Coverage * tiny fix --------- Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com> Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com> Co-authored-by: SylarTiaNII <15840554235@163.com>
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…ng pp overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
PR types
Performance optimization
PR changes
Others
Description
Sharding stage 1 PP/VP overlap
PCard-70444
Loss compare

pp=4 sharding=2, bfloat16 o2 with main_grad.
mean diff: 1.1900897999998605e-05