[AutoParallel] Fix PHI API inplace output code generation. #59133

GhostScreaming · 2023-11-19T11:46:37Z

PR types

Bug fixes

PR changes

Others

Description

Pcard-73145

更正了 PHI API 中对Inplace Output的处理。目前没有切分推导规则的算子，会默认将输入reshard成replicated状态，相应的得到的输出也为replicated。对于Inplace Output，可能导致input被修改成不符合预期的replicated状态。例如adamw_，初始化时参数标记为shard（张量并行切分Linear的Weights），执行完第一个iteration后，Weights退化成replicated，无法恢复到先前状态。对于这种情况，我们需要在兜底规则中，重新将Inplace Output reshard成初始的dist_attr。

对于有切分推导规则的Inplace Output，SetKernelDistOutput函数中不能设置Output的分布式属性，因为Input和Output共享同一个dist_tensor，需要在最后设置正确的dist_attr，不需要对Output进行reshard。

示例代码（adamw_，默认切分推导规则）：

    auto dist_out_attr_0 = static_cast<phi::distributed::DistTensor*>((std::get<0>(api_output)).impl().get())->dist_attr();

    auto dist_out_0 = SetKernelDistOutput(&std::get<0>(api_output));
    auto dense_out_0 = dist_out_0 ? dist_out_0->unsafe_mutable_value() : nullptr;
    if (!rank_is_in_current_mesh) {
      *dense_out_0 = phi::DenseTensor(
            std::make_shared<phi::Allocation>(nullptr, 0, phi::distributed::GetDefaultPlace()),
            phi::DenseTensorMeta());
    }

    ...

      // 8. DenseTensor Kernel Call
      using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const phi::DenseTensor&, const paddle::optional<phi::DenseTensor>&, const paddle::optional<phi::DenseTensor>&, const phi::Scalar&, const phi::Scalar&, const phi::Scalar&, float, float, bool, bool, int64_t, bool, bool, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*, phi::DenseTensor*);
      auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
      (*kernel_fn)(*dev_ctx, *input_param, *input_grad, *input_learning_rate, *input_moment1, *input_moment2, *input_beta1_pow, *input_beta2_pow, input_master_param, input_skip_update, phi::Scalar(beta1), phi::Scalar(beta2), phi::Scalar(epsilon), lr_ratio, coeff, with_decay, lazy_mode, min_row_size_to_use_multithread, multi_precision, use_global_beta_pow, dense_out_0, dense_out_1, dense_out_2, dense_out_3, dense_out_4, dense_out_5);

    }

    // 9. Set Output Dist Attr For Default Impl
    auto current_process_mesh = paddle::holds_alternative<phi::distributed::TensorDistAttr>(spmd_info.first[0]) ?
               paddle::get<0>(spmd_info.first[0]).process_mesh() : paddle::get<1>(spmd_info.first[0]).at(0).process_mesh();
    // dist_out_0 has no dist_attr if this api has no specified spmd_rules.
    SetReplicatedDistAttrForOutput(dist_out_0, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_1, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_2, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_3, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_4, current_process_mesh);
    SetReplicatedDistAttrForOutput(dist_out_5, current_process_mesh);
    // Set correct dist_attr for nplace output:
    // If no_spmd_rules, reshard it to origin dist_attr,
    // Or set correct spmd output dist_attr
    auto& output_0 = std::get<0>(api_output);
    SetInplaceOutputCorrectDistAttr(dev_ctx, output_0, dist_out_attr_0, true);

示例代码（add_，有切分推导规则）：

    auto dist_out = SetKernelDistOutput(&api_output, spmd_info.second[0]);
    auto dense_out = dist_out->unsafe_mutable_value();
    if (!rank_is_in_current_mesh) {{
      *dense_out = phi::DenseTensor(
            std::make_shared<phi::Allocation>(nullptr, 0, phi::distributed::GetDefaultPlace()),
            phi::DenseTensorMeta());
    }}

    ...

      // 8. DenseTensor Kernel Call
      using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::DenseTensor&, phi::DenseTensor*);
      auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
      (*kernel_fn)(*dev_ctx, *input_x, *input_y, dense_out);
    }

    // 9. Set Output Dist Attr For Default Impl
    // API `add` does not need to set DistAttr for output.
    // Set correct dist_attr for nplace output:
    // If no_spmd_rules, reshard it to origin dist_attr,
    // Or set correct spmd output dist_attr
    SetInplaceOutputCorrectDistAttr(dev_ctx, api_output, spmd_info.second[0], false);

paddle-bot · 2023-11-19T11:46:42Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… fix_inplace_api

…rules.

LiYuRio · 2023-11-20T07:00:21Z

paddle/phi/api/lib/data_transform.cc

+        if (ReshardIsNeeded(dist_tensor->dist_attr(), dist_attr[i])) {
+          if (need_reshard) {


这两个有啥区别吗，ReshardIsNeeded和need_reshard感觉是差不多的名字

ReshardIsNeeded是输入dist_tensor->dist_attr()和输入dist_attr[i]不一致的时候，需要进行reshard。need_reshard是输入参数，在 PHI API 这一层判断当前 API 有没有 SPMD rules，有的话不需要再对Output进行reshard，因为InferSPMD推导出的Output DistAttr是正确的，执行完kernel得到的Output local tensor也是正确的shape。只需要设置Output的dist_attr即可。

我之后再提个新PR，换个名字

已经修复，thx~

LiYuRio · 2023-11-20T07:01:37Z

paddle/phi/api/lib/data_transform.cc

+            VLOG(6) << "SetInplaceOutputCorrectDistAttr input "
+                    << tensors[i].name() << " set its dist_attr from "
+                    << dist_tensor->dist_attr() << " to " << dist_attr[i];
+            dist_tensor->unsafe_set_dist_attr(dist_attr[i]);


inplace的情况下，直接丢掉output的dist_attr，它的结果还能保证正确性吗

有SPMD rules的API才会走到这个分支，因为inplace output和input共用dist_tensor，前面不能把spmd_info的结果给output，否则reshard input会出错。output dist_attr的设置放在最后了。和SetReplicatedDistAttrForOutput的作用类似。

to use_general_spmd_rule.

… fix_inplace_api

FeixLiu

LGTM

chenwhql · 2023-11-22T03:10:41Z

paddle/phi/api/lib/api_gen_utils.cc

-      auto dist_t = std::make_shared<phi::distributed::DistTensor>(phi::DDim(),
-                                                                   dist_attr);
+      auto dist_t = std::make_shared<phi::distributed::DistTensor>(
+          phi::DDim(), paddle::get<0>(dist_attr));


这个原则上要用PADDLE_GET系列宏，或者用try_catch包裹，建议下个PR再修复一下

嗯嗯，我下个PR一起修一下~

…dle#59133)

Merge from develop branch and polish code.

7f5f5bd

GhostScreaming added 2 commits November 19, 2023 19:47

Remove useless modification.

ef7c7e6

Remove TODO.

02732a1

GhostScreaming changed the title ~~[AutoParallel] Fix PHI APU inplace output code generation.~~ [AutoParallel] Fix PHI API inplace output code generation. Nov 19, 2023

GhostScreaming added 4 commits November 19, 2023 20:48

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

22a1a4d

… fix_inplace_api

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

95eea34

… fix_inplace_api

Fix set dist_attr for inplace output when they have specialized spmd …

f039370

…rules.

Fix some problems.

ab43c20

LiYuRio reviewed Nov 20, 2023

View reviewed changes

GhostScreaming added 3 commits November 21, 2023 10:41

Fix problem of amp.

1c73ab9

Change paramerter of SetInplaceOutputCorrectDistAttr from need_reshard

a54c212

to use_general_spmd_rule.

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

aae2773

… fix_inplace_api

FeixLiu previously approved these changes Nov 21, 2023

View reviewed changes

Fix SetKernelDistOutput.

0787ce0

GhostScreaming dismissed FeixLiu’s stale review via 0787ce0 November 21, 2023 11:40

FeixLiu approved these changes Nov 22, 2023

View reviewed changes

chenwhql approved these changes Nov 22, 2023

View reviewed changes

FeixLiu merged commit f24d463 into PaddlePaddle:develop Nov 22, 2023

SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023

[AutoParallel] Fix PHI API inplace output code generation. (PaddlePad…

b059e0f

…dle#59133)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Fix PHI API inplace output code generation. #59133

[AutoParallel] Fix PHI API inplace output code generation. #59133

GhostScreaming commented Nov 19, 2023 •

edited

Loading

paddle-bot bot commented Nov 19, 2023

LiYuRio Nov 20, 2023

GhostScreaming Nov 20, 2023

GhostScreaming Nov 20, 2023

GhostScreaming Nov 22, 2023

LiYuRio Nov 20, 2023

GhostScreaming Nov 20, 2023

FeixLiu left a comment

chenwhql Nov 22, 2023

GhostScreaming Nov 22, 2023

		if (ReshardIsNeeded(dist_tensor->dist_attr(), dist_attr[i])) {
		if (need_reshard) {

[AutoParallel] Fix PHI API inplace output code generation. #59133

[AutoParallel] Fix PHI API inplace output code generation. #59133

Conversation

GhostScreaming commented Nov 19, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Nov 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FeixLiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GhostScreaming commented Nov 19, 2023 •

edited

Loading