-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reshard] Implement reshard from s to r with same process_mesh #56039
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
cfa951a
to
f16495d
Compare
@@ -109,6 +114,21 @@ std::string GetMasterEndpoint() { | |||
return master_endpoint; | |||
} | |||
|
|||
std::string GenUniqueCommKey(const std::vector<int64_t>& process_ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
仅仅支持一维process mesh吗?如果高维和低维都是一个process id编号,key是否相同?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数只负责把传入的process_ids的vector变成唯一的comm_key,有两种情况:
- 如果输入输出的process_mesh相同,这种可以直接调用集合通信操作,只要把展平的process_id传入就行。
- 如果输入输出的process_mesh不同,需要在调用函数前,结合具体情况,分组创建通信组,这时候一般创建的是点对点通信组。
不管是高维还是低维,只要它们参与通信的进程相同,key就是相同的
paddle/phi/core/distributed/auto_parallel/reshard_all_gather_functor.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Description
Pcard-73145
支持Shard到Replicate的状态转换,要求:
输入输出的process_mesh为一维;
输出输出不跨mesh;(process_mesh无变化)
输入的shard状态为均匀切分,Tensor的切分维度能被对应组的进程数整除。
以4卡为例,输入输出都是一维process_mesh,[0, 1, 2, 3],输出为二维Replicate状态,out_dims_mapping = [-1, -1],输入为二维Shard状态,in_tensor_shape = [4, 8]
TODO: