[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

FeixLiu · 2023-06-02T09:09:03Z

PR types

Performance optimization

PR changes

Others

Description

Sharding stage 1 PP/VP overlap

PCard-70444

Loss compare
pp=4 sharding=2, bfloat16 o2 with main_grad.

mean diff: 1.1900897999998605e-05

paddle-bot · 2023-06-02T09:09:07Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

haohongxiang

LGTM

haohongxiang

LGTM

* sharding pp overlap * bug fix * update * rename function * update code logic

* add timer to pp (#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312)

…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)

* part-2 cherry from: add timer to pp (#53831) + sharding pp overlap (#54312) (#54360) * add timer to pp (#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312) * part-2 cherry from: [hybrid performance] Early grad fusion. (#54403) (#54443) * part-2 cherry from: 【new_frl】modify multiply_grad_node create (#54389) * modify multiply_grad_node create * modify build conflict * add place choose * ci segment fault * clear branch * part-2 cherry from: add assert overlap (#54559) * part-2 cherry from: pipeline model 移除 self.data (#54387) * polish * polish * polish * polish * polish * polish * part-2 cherry from : support sharding stage1 (#54069) * support sharding stage1 * fix unittest * format * pass sharded sharding params_and_grads to inner_opt apply_pptimize * change sharding gradient allreduce to reduce * support save state_dict adptively and support sharding with mp * fix sharding test * test set_state_dict * add more unit test * fix global norm of mp case * polish * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp * remove print * tiny fix * part-2 fix PR (#54389) * tinyfix * part-2 cherry from: Align VPP global norm clip with PP (#54820) * part-2 cherry from: Align VPP global norm clip with PP (#54820) * part-2 cherry from: Make FLAGS_force_align_vpp_grad_sum_order default to false (#54937) * make FLAGS_force_align_vpp_grad_sum_order default to false * polish code * part-2 cherry from: support add(x_float32, bfloa16_) or add(x_float32, y_float16) (#54611) * support add(x_float32, bfloa16_) or add(x_float32, y_float16) * polisg * tinyfix * part-2 cherry from: fix bug of pp (#54831) * fix in_place ut failed caused by PR #54389 * fix comments * fix comments * fix comments * part-2 cherry from: add perf test api to fleet (#54856) * part-2 cherry from: [Distributed] Opt nccl connection by lazy initialization (#55005) * part-2 cherry from: refine dygraph_sharding_optimizer.py by sorting parameters (#55059) * refine dygraph_sharding_optimizer.py by sorting parameters * Update dygraph_sharding_optimizer.py Make FLAGS_sharding_sort_parameters=1 by default. * part-2 cherry from: Fix hybrid_parallel_sharding_model.py ut (#55269) * fix hybrid_parallel_sharding_model.py * Update hybrid_parallel_sharding_model.py * part-2 cherry from: add fleet test tools. (#55701) * part-2 cherry from: add device synchronize for p2p (#55461) * part-2 cherry from: new_frl_shard_reduce (#55353) * new_frl_shard_redece * add mp guard * add test * part-2 cherry from: add paddle.async_save to reduce time cost by checkpoint saving (#55115) * add paddle.async_save to reduce time cost by checkpoint saving * adapt save_for_auto_inference to paddle.async_save * modify UT * modify UT * fix on cpu only version * revert commit on save_auto_inference * fix threading * remove duplicate in topology.py and dygraph_sharding_optimizer.py * part-2 tinyfix cherry-pick * fix ci * fix test_parallel_dygraph_sharding_parallel * rename perf_test to collective_test, collective_xx_test to collective_xx_perf * fix comments: remove useless logic * reform fleet.collective_perf API && add doc strings * fix PR-CI-Coverage * tiny fix --------- Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com> Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com> Co-authored-by: SylarTiaNII <15840554235@163.com>

…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)

…ng pp overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)

…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360) * add timer to pp (PaddlePaddle#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)

FeixLiu changed the title ~~Sharding stage 1 PP/VP overlap~~ [Hybrid Performance] Sharding stage 1 PP/VP overlap Jun 2, 2023

FeixLiu force-pushed the sharding_pp_overlap branch from 5c7d9d9 to 950f07f Compare June 2, 2023 09:19

FeixLiu requested review from haohongxiang and sneaxiy June 2, 2023 09:29

haohongxiang previously approved these changes Jun 2, 2023

View reviewed changes

FeixLiu dismissed haohongxiang’s stale review via 7919785 June 2, 2023 09:36

FeixLiu added 5 commits June 5, 2023 08:22

sharding pp overlap

356be4c

bug fix

21b6fbe

update

14e8c81

rename function

871adb6

update code logic

c7c5e98

FeixLiu force-pushed the sharding_pp_overlap branch from 728b43e to c7c5e98 Compare June 5, 2023 00:23

haohongxiang approved these changes Jun 5, 2023

View reviewed changes

sneaxiy approved these changes Jun 5, 2023

View reviewed changes

sneaxiy merged commit 82dd6b1 into PaddlePaddle:develop Jun 5, 2023

FeixLiu mentioned this pull request Jun 5, 2023

add timer to pp (#53831) + sharding pp overlap (#54312) #54360

Merged

FeixLiu deleted the sharding_pp_overlap branch June 7, 2023 07:25

FeixLiu added a commit to FeixLiu/Paddle that referenced this pull request Jun 7, 2023

[Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)

e382d00

* sharding pp overlap * bug fix * update * rename function * update code logic

FeixLiu added a commit that referenced this pull request Jun 8, 2023

add timer to pp (#53831) + sharding pp overlap (#54312) (#54360)

af7c4a3

* add timer to pp (#53831) * [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

FeixLiu commented Jun 2, 2023 •

edited

Loading

paddle-bot bot commented Jun 2, 2023

haohongxiang left a comment

haohongxiang left a comment

[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

Conversation

FeixLiu commented Jun 2, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 2, 2023

haohongxiang left a comment

Choose a reason for hiding this comment

haohongxiang left a comment

Choose a reason for hiding this comment

FeixLiu commented Jun 2, 2023 •

edited

Loading