-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] Adapt static spmd rules for dynamic graph #56367
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
… ap/adapt_infer_spmd
… ap/adapt_infer_spmd
}) | ||
.def("infer_backward", | ||
[](const phi::distributed::SpmdRule &self, | ||
const std::vector<DistTensorSpec> &input_specs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infer_backward need the info of input tensors and output tensors for inference, please ref to new api:
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/distributed/auto_parallel/spmd_rules/common.h#L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, change pybind infer_backward
api to this format
@@ -340,6 +343,44 @@ void BindAutoParallel(py::module *m) { | |||
.def("infer_forward", &SPMDRuleBase::InferForward) | |||
.def("infer_backward", &SPMDRuleBase::InferBackward); | |||
|
|||
py::class_<phi::distributed::SpmdRule>(*m, "SpmdRule") | |||
.def("infer_forward", | |||
[](const phi::distributed::SpmdRule &self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistTensorSpec seen to be redundant now, would it be better that expose the InferSpmdContext and MetaTensor API into python and static mode build the input ctx directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be determined according to the needs of semi-static. This PR try not to change the original test framework as much as possible.
y_dist_tensor_spec.set_dims_mapping({-1, 0}); | ||
infered_dist_attrs = matmul_rule->InferForward( | ||
{x_dist_tensor_spec, y_dist_tensor_spec}, attrs); | ||
x_dist_attr.set_dims_mapping({-1, -1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better that provide an API which build MetaTensor for "shape" and "distattr" directly?
or build inferspmdcontext from "shape" and "distattr" and attribute directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not recommended, MetaTensor is a thin encapsulation and does not hold the object life cycle. If such a constructor is required, I tend to inherit a DistMetaTensor to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// TODO(chenweihang): support other attr type later by needed | ||
PD_SPECIALIZE_InferSpmdFnCallHelper_FOR_ATTRIBUTE(bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this method cover complex attribute type like std::vector<int64_t>, std::vector, std::vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can support these types later, now matmul cannot cover these types
static SpmdInfo Call(const InferSpmdContext& ctx, PreviousArgs&... pargs) { | ||
static_assert(attr_idx == 0, | ||
"InferSpmd's Input should appear before Attributes."); | ||
const DistMetaTensor& arg = ctx.InputAt(in_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should ctx maintains input_tensor_list and output_tensor_list separately ?
in case of variadic input/output Op, it maybe a problem:
- variadic input and single output op(concat,addn):could be adapted by assume the last tensor is output
- single input and variadic output op(split,unstack):could be adapted by assume the first tensor is input
- variadic input and variadic output op (not yet,but future ?):could not be adapted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to distinguish the list of input and output here.
For the case of vector input and output, our rule function will also faithfully reflect its type at that time, so there is no need for additional merging.
For example:
- concat op
SpmdInfo ConcatSpmdInferForward(const std::vector<const DistMetaTensor*>& x,
const DistMetaTensor& out,
const Scalar& axis_scalar);
- split op
SpmdInfo SplitSpmdInferBackward(const DistMetaTensor& x,
const std::vector<const DistMetaTensor*>& out,
const IntArray& sections,
const Scalar& axis);
auto out_shape = output_specs[0].shape(); | ||
SpmdInfo MatmulSpmdInferBackward(const DistMetaTensor& x, | ||
const DistMetaTensor& y, | ||
const DistMetaTensor& out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for variadic op like split and concat, should use vector for the variadic slot ?
Phi api for concat:
PADDLE_API Tensor concat(const std::vector& x, const Scalar& axis)
spmd for concat:
SpmdInfo ConcatSpmdInferBackward(const std::vector& x, const DistMetaTensor& out, const Scalar& axis)
AND for ReplicatedSpmd Rule which is a bottom line rule for all Ops that have not specific rule:
SpmdInfo ReplicatedSpmdInferBackward(const std::vector& x, const std::vector&out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
for ReplicatedSpmd Rule, we can use the general format
SpmdInfo ReplicatedSpmdInferBackward(
const std::vector<const DistMetaTensor*>& x,
const std::vector<const DistMetaTensor*>& out,
const std::vector<phi::Attribtue>& attrs)
we also can unify its format into SpmdInfo (*)(const InferSpmdContext&)
# After replaced all spmd rules by phi impl, we can recover the | ||
# api name to `get_spmd_rule` | ||
self.rule = core.get_phi_spmd_rule("matmul") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释可以加到pybind接口上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, 下一个PR调整
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e#56367) * move matmul spmd rules into phi * add basic infer spmd utils * addspmd factory * fix compile error * add unittest * refine infer spmd test and utils * debug infer spmd test * adapt python test * poish details * change to vector attr arg * revert needless change * update matmul spmd rule test * remove original rule * polish details * fix marco error * add comment * pass backward test * fix compile error * add cmake rule for spmd_rules_test * add dist meta tensor * update pybind impl * add marco for rules
PR types
New features
PR changes
Others
Description
Pcard-73145
[AutoParallel] Adapt static infer spmd
本PR尝试在动半适配现有的切分推导规则,需要对现有的设计做一些局部改动,设计说明如下
现有切分推导规则基类核心函数如下:
phi/core/distributed/auto_parallel
中phi/infermeta/spmd_rules
中,算子特异化的实现原则上不能放在core目录下,且spmd属于tensor meta信息的一种,放到infermeta下也合理Spmd推导函数的返回值暂时仍使用
std::pair<std::vector<TensorDistAttr>, std::vector<TensorDistAttr>>
,与原先设计保持一致,但考虑到动态图对调度性能的要求,这大概率不是最终状态,STL容器构造析构对API调度性能有较大的影响,最后有可能还是需要设置到DistTensor的dist_attr_成员上,看后期调度性能的影响再决定Spmd推导函数的输入参数采用了InferSpmdContext进行归一化,考虑如下:
const std::vector<DistTensorSpec>& input_specs, const paddle::framework::AttributeMap& attrs
能够满足需求,但如果后面出现Tensor
和vector<Tensor>
并存的输入参数,可能需要进一步引入range进行区分,用context归一化可以适配将来可能的变化,而不需要将来去统一改变函数签名Spmd推导函数的输入Tensor要额外构建一个容器去存放,换用small_vector,相比std::vector可节省一些堆上构造析构开销
Spmd推导函数的输入属性需要使用vector结构,无法使用map结构,因为在动态图执行流程中传入的时候,属性没有name,用vector结构也可以适配静态图的map输入,有必要的话此处可以复用phi之前建设实现的大量的arg mapping函数
基于CodeStyle中的命名约定,命名风格上采用Spmd而不是SPMD
原先SPMD rules迁移而不是拷贝,仅保留一份代码;utils函数由于多处使用,先拷贝一份,后续可以迁移完成后移除原先的实现,暂时不在本PR中全局替换
原先python端单测的形式先尽可能保持不变,因此在pybind层通过参数处理以匹配不同的参数形式,后续视静半的使用需求再调整