Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hybrid Performance] Sharding stage 1 PP/VP overlap #54312

Merged
merged 5 commits into from
Jun 5, 2023

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Jun 2, 2023

PR types

Performance optimization

PR changes

Others

Description

Sharding stage 1 PP/VP overlap

PCard-70444

Loss compare
pp=4 sharding=2, bfloat16 o2 with main_grad.
image
mean diff: 1.1900897999998605e-05

@paddle-bot
Copy link

paddle-bot bot commented Jun 2, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@FeixLiu FeixLiu changed the title Sharding stage 1 PP/VP overlap [Hybrid Performance] Sharding stage 1 PP/VP overlap Jun 2, 2023
@FeixLiu FeixLiu force-pushed the sharding_pp_overlap branch from 5c7d9d9 to 950f07f Compare June 2, 2023 09:19
@FeixLiu FeixLiu requested review from haohongxiang and sneaxiy June 2, 2023 09:29
haohongxiang
haohongxiang previously approved these changes Jun 2, 2023
Copy link
Contributor

@haohongxiang haohongxiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FeixLiu FeixLiu force-pushed the sharding_pp_overlap branch from 728b43e to c7c5e98 Compare June 5, 2023 00:23
Copy link
Contributor

@haohongxiang haohongxiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit 82dd6b1 into PaddlePaddle:develop Jun 5, 2023
@FeixLiu FeixLiu deleted the sharding_pp_overlap branch June 7, 2023 07:25
FeixLiu added a commit to FeixLiu/Paddle that referenced this pull request Jun 7, 2023
* sharding pp overlap

* bug fix

* update

* rename function

* update code logic
FeixLiu added a commit that referenced this pull request Jun 8, 2023
* add timer to pp (#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 14, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 23, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 23, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 25, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
zhiqiu pushed a commit that referenced this pull request Nov 28, 2023
* part-2 cherry from: add timer to pp (#53831) + sharding pp overlap (#54312) (#54360)

* add timer to pp (#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (#54312)

* part-2 cherry from: [hybrid performance] Early grad fusion. (#54403) (#54443)

* part-2 cherry from: 【new_frl】modify multiply_grad_node create (#54389)

* modify multiply_grad_node create

* modify build conflict

* add place choose

* ci segment fault

* clear branch

* part-2 cherry from: add assert overlap (#54559)

* part-2 cherry from: pipeline model 移除 self.data (#54387)

* polish

* polish

* polish

* polish

* polish

* polish

* part-2 cherry from : support sharding stage1 (#54069)

* support sharding stage1

* fix unittest

* format

* pass sharded sharding params_and_grads to inner_opt apply_pptimize

* change sharding gradient allreduce to reduce

* support save state_dict adptively and support sharding with mp

* fix sharding test

* test set_state_dict

* add more unit test

* fix global norm of mp case

* polish

* hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp

* remove print

* tiny fix

* part-2 fix PR (#54389)

* tinyfix

* part-2 cherry from: Align VPP global norm clip with PP (#54820)

* part-2 cherry from: Align VPP global norm clip with PP (#54820)

* part-2 cherry from: Make FLAGS_force_align_vpp_grad_sum_order default to false (#54937)

* make FLAGS_force_align_vpp_grad_sum_order default to false

* polish code

* part-2 cherry from: support add(x_float32, bfloa16_) or add(x_float32, y_float16) (#54611)

* support add(x_float32, bfloa16_) or add(x_float32, y_float16)

* polisg

* tinyfix

* part-2 cherry from: fix bug of pp (#54831)

* fix in_place ut failed caused by PR #54389

* fix comments

* fix comments

* fix comments

* part-2 cherry from: add perf test api to fleet (#54856)

* part-2 cherry from: [Distributed] Opt nccl connection by lazy initialization (#55005)

* part-2 cherry from: refine dygraph_sharding_optimizer.py by sorting parameters (#55059)

* refine dygraph_sharding_optimizer.py by sorting parameters

* Update dygraph_sharding_optimizer.py

Make FLAGS_sharding_sort_parameters=1 by default.

* part-2 cherry from: Fix hybrid_parallel_sharding_model.py ut (#55269)

* fix hybrid_parallel_sharding_model.py

* Update hybrid_parallel_sharding_model.py

* part-2 cherry from: add fleet test tools. (#55701)

* part-2 cherry from: add device synchronize for p2p (#55461)

* part-2 cherry from: new_frl_shard_reduce (#55353)

* new_frl_shard_redece

* add mp guard

* add test

* part-2 cherry from: add paddle.async_save to reduce time cost by checkpoint saving (#55115)

* add paddle.async_save to reduce time cost by checkpoint saving

* adapt save_for_auto_inference to paddle.async_save

* modify UT

* modify UT

* fix on cpu only version

* revert commit on save_auto_inference

* fix threading

* remove duplicate in topology.py and dygraph_sharding_optimizer.py

* part-2 tinyfix cherry-pick

* fix ci

* fix test_parallel_dygraph_sharding_parallel

* rename perf_test to collective_test, collective_xx_test to collective_xx_perf

* fix comments: remove useless logic

* reform fleet.collective_perf API && add doc strings

* fix PR-CI-Coverage

* tiny fix

---------

Co-authored-by: Yuang Liu <liuyuang@baidu.com>
Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com>
Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: ShenLiang <1422485404@qq.com>
Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com>
Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com>
Co-authored-by: SylarTiaNII <15840554235@163.com>
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…ng pp overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…p overlap (PaddlePaddle#54312) (PaddlePaddle#54360)

* add timer to pp (PaddlePaddle#53831)

* [Hybrid Performance] Sharding stage 1 PP/VP overlap (PaddlePaddle#54312)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants