-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[auto parallel] Add expand_v2 spmd rules #59432
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
…uqi/spmd_expand
参考squeeze, 配置下yaml,增加分布式测试用例 |
|
||
SpmdInfo ExpandInferSpmd(const DistMetaTensor& x, | ||
const std::vector<int64_t>& shape); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和 ops.yaml 签名保持一致,使用IntArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的理解是 spmd_rules 的接口应该不需要改成 IntArray?因为代码自动生成的脚本(https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/api/yaml/generator/dist_api_gen.py#L851-L852) 会判断 ops.yaml 签名里是否有 IntArray,如果有的话,会在生成的 api.cc 和 backward.cc 里为 const IntArray& 类型自动添加 GetData() 获取对应的 vector<int64_t> 以适配 spmd_rules 接口,如图:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要统一一下,包括ops.yaml、dist_api_gen.py、InferSPMD的shape,应该给std::vector<int64_t>
或者 IntArray
。如果改成IntArray
的话,ReshapeInferSPMD
的shape也需要相应改一下,避免dist_api_gen.py有两套实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要统一一下,包括ops.yaml、dist_api_gen.py、InferSPMD的shape,应该给
std::vector<int64_t>
或者IntArray
。如果改成IntArray
的话,ReshapeInferSPMD
的shape也需要相应改一下,避免dist_api_gen.py有两套实现
那我这里还是先不做改动,保留 std::vector<int64_t> ?这个 pr 先完善 expand 的 spmd,下一个 pr 再做统一的工作?
SpmdInfo ExpandInferSpmdReverse(const DistMetaTensor& x, | ||
const DistMetaTensor& out, | ||
const std::vector<int64_t>& shape); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
602392f
to
2c35242
Compare
@@ -558,6 +558,12 @@ def is_reshape_kernel(self): | |||
and 'grad' not in self.kernel['func'][0] | |||
) | |||
|
|||
def is_expand_kernel(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里建议和is_reshape_kernel
整合成一个函数need_calculate_local_shape
,给一个白名单['reshape', 'expand']
,在白名单上的kernel才需要特别处理。后续需要计算local_shape的kernel加到白名单上即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,将会完善这个 pr
Sorry to inform you that 33acb63's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
Since you haven't replied for more than a year, we have closed this issue/pr. |
PR types
New features
PR changes
Others
Description
add expand_v2 spmd rules
单测运行出错:
test_expand_shard_0
单测报错:反向结果不正确test_expand_shard_on_0
单测报错:反向结果不正确test_expand_shard_on_2
单测报错:运行前向 expand 即报错