Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reshard] Implement replicated to split with same placement #55552

Merged
merged 4 commits into from
Jul 26, 2023

Conversation

LiYuRio
Copy link
Contributor

@LiYuRio LiYuRio commented Jul 19, 2023

PR types

New features

PR changes

Others

Description

Pcard-73145

支持Replicated到Shard的状态转换,要求:

  • 输入输出的process_mesh为一维;
  • 输出输出不跨mesh;(process_mesh无变化)
  • 均匀切分,Tensor的维度能被对应组的进程数整除。

以4卡为例,输入输出都是一维process_mesh,[0, 1, 2, 3],输入为二维Replicated状态,in_tensor_shape = [4, 8],in_dims_mapping = [-1, -1],输出为二维Shard状态。

  1. 用process_mesh的0维切分输入的0维,out_dims_mapping = [0, -1],每个进程上最终有形状为[1, 8]的物理tensor。
  2. 用process_mesh的0维切分输入的1维,out_dims_mapping = [-1, 0],每个进程上最终有形状为[4, 2]的物理tensor。

@paddle-bot
Copy link

paddle-bot bot commented Jul 19, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@LiYuRio LiYuRio changed the title [Reshard] Implement broadcast to split with same placement [Reshard] Implement replicated to split with same placement Jul 19, 2023
@LiYuRio LiYuRio force-pushed the dev_reshard branch 8 times, most recently from d2ad29f to ae6fc4b Compare July 20, 2023 08:07
@LiYuRio LiYuRio requested a review from chenwhql July 20, 2023 08:11
@LiYuRio LiYuRio force-pushed the dev_reshard branch 2 times, most recently from bc01ab6 to fd19cbb Compare July 20, 2023 08:31
ReshardFunction() = default;
virtual ~ReshardFunction() = default;

virtual bool Check(const DistTensor& in,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

细节问题,这个接口语义可以再具体一些吗?尽量让代码可以自解释一些,比如这里是check什么?Check是否Valid吗

Copy link
Contributor Author

@LiYuRio LiYuRio Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成了IsSuitable,检查当前状态转换函数是否适合,

@LiYuRio LiYuRio force-pushed the dev_reshard branch 4 times, most recently from 55bb8b6 to 1a4debb Compare July 24, 2023 12:38
@LiYuRio LiYuRio force-pushed the dev_reshard branch 2 times, most recently from ae347a6 to 6c53507 Compare July 25, 2023 05:14
virtual bool Check(const DistTensor& in,
const std::shared_ptr<TensorDistAttr>& out_dist_attr) = 0;

virtual std::shared_ptr<DistTensor> Eval(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Eval" mean?

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LiYuRio LiYuRio merged commit 9f3b5f1 into PaddlePaddle:develop Jul 26, 2023
@LiYuRio LiYuRio deleted the dev_reshard branch July 26, 2023 04:58
wz1qqx pushed a commit to wz1qqx/Paddle that referenced this pull request Jul 31, 2023
…ddle#55552)

* Implement replicated to split reshard function

* fix link error in clang

* refine split functor

* simplify reshard code
jinjidejinmuyan pushed a commit to jinjidejinmuyan/Paddle that referenced this pull request Aug 30, 2023
…ddle#55552)

* Implement replicated to split reshard function

* fix link error in clang

* refine split functor

* simplify reshard code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants